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Abstract 



We often encounter probability distributions given as unnormalized products of non- negative 
functions. The factorization structures of the probability distributions are represented by 
hypergraphs called factor graphs. Such distributions appear in various fields, including 
statistics, artificial intelligence, statistical physics, error correcting codes, etc. 

Given such a distribution, computations of marginal distributions and the normalization 
constant are often required. However, they are computationally intractable because of their 
computational costs. One, empirically successful and tractable, approximation method is 
the Loopy Belief Propagation (LBP) algorithm. 

The focus of this thesis is an analysis of the LBP algorithm. If the factor graph is a 
tree, i.e. having no cycle, the algorithm gives the exact quantities, not approximations. If 
the factor graph has cycles, however, the LBP algorithm does not give exact results and 
possibly exhibits oscillatory and non convergent behaviors. The thematic question of this 
thesis is "How do the behaviors of the LBP algorithm are affected by the discrete geometry 
of the factor graph?" Here, the word "discrete geometry" means the geometry of the factor 
graph as a space. 

The primary contribution of this thesis is the discovery of a formula called the Bethe- 
zeta formula, which establishes the relation between the LBP, the Bethe free energy and 
the graph zeta function. This formula provides new techniques for analysis of the LBP 
algorithm, connecting properties of the graph and of the LBP and the Bethe free energy. 
We demonstrate applications of the techniques to several problems including (non) convexity 
of the Bethe free energy, the uniqueness and stability of the LBP fixed point. 

We also discuss the loop series initiated by Chertkov and Chernyak (2006). The loop 
series is a subgraph expansion of the normalization constant, or partition function, and 
reflects the graph geometry. We investigate theoretical natures of the series. Moreover, we 
show a partial connection between the loop series and the graph zeta function. 
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Notational Conventions 



General Notation 


M(n,r 2 ) 


set of ri x r2 matrices 


(x,y) 


inner product of vectors x and Y2 x iVi 




norm 


Spec(X) 


set of eigenvalues of matrix X 


P{X) 


spectral radius of X 


diag(x) 


diagonal matrix with diagonal elements Xi 


V 2 


Hessian operator 


intG 


interior of a set G 


sgn(x) 


sign of a real value x 


EpM 


expectation of <f)(x) under p 


Cov p [</>,<//] 


covariance of <j){x) and 4>'(x) under p(x) 


Cor p [<^'] 


correlation of <p{x) and 4>'(x) under p(x) 


Var p [0] 


variance 4>{x) and under p(x) 
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Graphs 


G 


undirected graph G = (V, E) 


V 


vertex set 


E 


edge set 


H 


hypergraph H = (V,F) 


F 


hyperedge set 


Ni 


neighborhood of vertex i. 


N a 


neighborhood of hyperedge a. 


di 


degree of vertex i. 


da 


degree of hyperedge a. 


core(H) 


core of hypergraph H 


e 


directed edge (undirected edge in chapter 7) 




Graphical models 


* = {* Q } 


set of compatibility functions, graphical model 


Xi 


random variable on i G V 




value set of x; L 


X 




z 


partition function, normalization constant 




Exponential families 


£ 


exponential family 


(f)(x) 


sufficient statistics 




log partition function 




negative entropy 


A 


Legendre map 


e 


set of natural parameters 


Y 


set of expectation parameters 
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Inference family, LBP 


2^ — {£<*) £i}a£F,ieV 


inference family 


4>a{x a ) = {4>{ct){%a)Aii{ x h), ■ ■ ■ ><Pi da {%i da )) sufficient statistics of £ a 


&a — (fi(a) ) @a:ii j • • • ; @a:i c i a 


) natural parameters of £ a 


Va = {V(a) i fja-.h , ■ ■ ■ j Va-i^ 


) expectation parameters of £ a 




set of natural parameters of £ a . 9 a £ Q a 




set of expectation parameters of £ a . r] a £ Y a 




global exponential family 


F 


type 1 Bethe free energy function 


J" 


type 2 Bethe free energy function 


L(X) 


domain of type 1 BFE function 




domain of type 2 BFE function 


?n a _>.j 


message from a to i 


Ma— >i 


natural parameter of m a ^i 




Graph zeta 


E 


set of directed edges 


s(e) 


starting factor of directed edge e 


t(e) 


terminus vertex of directed edge e 


e — e' 


t(e) £ s(e') and t(e) ^ i(e') 




set of prime cycles of hypergraph H 


p 


prime cycle 




positive integer (dimension) associated with e 




positive integer (dimension) associated with i G V 




set of C re valued functions on E 


£(V) 


set of C Vi valued functions on V 


A*(u) 


linear operator on X(E), defined in Eq. (|3.4|) 




linear operator on X(E), defined in Eq. (|3.9p 


l/« 


diagonal block of I + l{u) 


<->i 


element of W a = U~ l 


27, W 


linear operators on X(V), defined in Eq. (|3.12|) 


Ch 


zeta function of a hypergraph H 


Z g 


zeta function of a graph G 
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Chapter 1 



Introduction 



This chapter provides a background of main topics of this thesis, motivating our approach: 
discrete geometric analysis. Formal definitions and problem settings are given in later 
chapters. The first section gives a short introduction of graphical models, which is the main 
object discussed in this thesis, explaining the important associated computational tasks, i.e., 
the computation of marginal distributions and the partition function. The second section 
introduces an efficient and powerful approximation algorithm: Loopy Belief Propagation 
(LBP), which is thoroughly analyzed in this thesis. This section also explains the importance 
of considering the graph geometry for the analysis. In this thesis, we refer to "graph 
geometry" as discrete geometry. In the third section, we discuss the discrete geometry 
that should be considered in the context of LBP. We first review that interplays between 
geometric spaces and objects on it, often have appeared in the history of mathematics. We 
also review tools in graph theory devised for understanding graphs. The final section is 
devoted to the description of the organization of this thesis as well as a short summary of 
each chapter. 



1.1 Graphical models 

1.1.1 Introduction of graphical models 

A graphical model consists of a set of random variables which has a dependence structure 
represented by a certain type of graph. There are many classes of graphical models such 
as pairwise models and Bayesian networks. Among them, factor graph models include 
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Figure 1.1: The factor graph associated with the factorization Eq. 



wide classes of graphical models and express factorization structure that are needed for our 
purpose. 

Here we give an example of a factor graph model. Let x = (xi,X2,X3,X4) be random 
variables and assume that the probability density function of x is given by a factorized form, 



where ^123 and ^134 are non-negative functions called compatibility functions and Z is 
the normalization constant. This factorization structure is cleverly depicted by a factor 
graph; the factor graph for this example is given by Figure 11.11 Each square represents a 
compatibility function and a circle represents a variable. If a compatibility function has a 
variable as an argument, the corresponding square and the circle are joined by an edge. 

Formally, in this thesis, a graphical model is referred to as a set of compatibility functions, 
which defines the probability distribution by the product. For general graphical models, 
the way of illustrating factor graphs is obvious, i.e., draw squares and circles corresponding 
to the compatibility functions and variables respectively, and join them if a variable is an 
argument of a compatibility function. Usually, the index sets of variables and compatibility 
functions are denoted by V and F respectively. 

Note that, if all the compatibility functions have two variables as the arguments, the 
factor graph is more simply represented by an undirected graph G = (V,E). Indeed, we 
can replace each square and the two edges in the factor graph by a single edge without loss 
of information. 
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1.1.2 Important computational tasks 

When a graphical model 0111= (^i)iey is given by 

P(x) OC Y[ ^a(Xa), %a = (Xi)iea (1-2) 

one sometimes needs to compute marginal distributions over relatively small subsets of the 
variables. For example, the marginal distribution of x± is 

pi(xi) = ^2 p( x )- ( L3 ) 

It is also important to compute the partition function, i.e. the normalization constant: 

x aeF 

Several examples that need these computations are given in the next subsection. 

Despite its importance, the computation of marginal distributions and the partition 
function are often unfeasible tasks especially if the number of variables is large and the 
ranges of variables are discrete. Assuming that the variables are binary, one observes 
that the direct computation of each of these quantities requires sums. In fact, 

in the discrete variables cases, the problem of computing partition functions is NP-hard 
[5J[2S]. Therefore, we need to develop approximation methods that give accurate results for 
graphical models appearing in real worlds. 

It is noteworthy that the exact computation is sometimes feasible using devices for 
reducing computational cost. A major approach is the junction tree algorithm [3Uj . which 
makes a tree from the associated graph. This algorithm requires the computational cost 
exponential to the largest clique size of the triangulated graph. Rather than the junction 
tree algorithm, we analyze the LBP algorithm in this thesis. One reason is that the largest 
size of cliques is often too big for running the junction tree algorithm in a practical time 
even if the LBP algorithm can be executed quickly. Another reason is more theoretical; 
we would like to capture graph geometry as discrepancies between local computations and 
global computations. Since the junction tree algorithm reduces to a tree, which has globally 
trivial geometry, the junction tree algorithm does not have such an aspect. 
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1.1.3 Examples and applications of graphical models 

This subsection gives examples of graphical models and explains the importance of the 
partition function and/or marginal distributions in each case. Typically, graphical model is 
used to implement our knowledge of dependency among random variables. 

One example comes from statistical physics. Let us consider the following form of 
graphical model on G = (V, E) called (disordered) Ising model, Spin-glass model or binary 
pairwise model: 

p{x) = — exp( ^2 J ij x i x j + h i x 'i)> ( L5 ) 

ijeE i£V 

where X\ = ±1. This model abstracts behaviors of spins laid on vertices of the graph. Each 
spin has two (up and down) states and only interacts with neighbors. Importance of the 
one-variable marginal distributions may be agreed because they describe probabilities of 
states of spins. However, importance of the partition function may be less obvious. One 
reason comes from its logarithmic derivatives; the expected values and correlations of the 
variables are given by the derivatives of the log partition function: 

dl Qj Z = E pi x i x jl dl Qh Z = E P^' ( L6 ) 

Important physical quantities such as energy, entropy etc are easily calculated by the log 
partition function, or equivalently the Gibbs free energy [92j. 

Another example comes from error correcting codes. From the original information, 
the sender generates a certain class of binary sequence called codeword and transmit it 
thorough a noisy channel |101j . If the number of errors is relatively small, the receiver 
can correct them using added redundancy. The decoding process can be formulated as an 
inference problem of finding a plausible codeword. In linear codes, a codeword is made to 
satisfy local constraints, i.e., the sum of certain subsets of bits is equal to zero in modulo 
two arithmetic. Then the probabilities of codewords are given by a graphical model. The 
marginals can be used as an estimate of each bit. LDPC codes and turbo codes are included 
in this type of algorithms [SU [751 HO]. 

We also find examples in the field of image processing. In the super-resolution problem, 
one would like to infer a high resolution image from multiple low resolution images [1Q^ 
114] . The high resolution image can be interpreted as a graphical model imposing local 
constraints on pixels. The marginal distributions of the model give the inferred image. The 
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compatibility functions are often learned from training images |39[ [50] . 

Another example is found in statistics and artificial intelligence. Domain-specific knowl- 
edge of experts can be structured and quantified using graphical models. One can perform 
inference from new information, processing the model. Such a system is called expert sys- 
tem |30| . For example, medical knowledge and experiences can be interpreted as a graphical 
model. If a person is a smoker, he or she is likely to have lung cancer or bronchitis compared 
to non smokers. This empirical knowledge is represented by a compatibility function having 
variables "smoking," "lung cancer" and "bronchitis." Moreover, medical experts have a lot 
of data on the relation between diseases, symptom and other information. Utilizing such 
knowledge, statisticians can make a graphical model over the variables related to medical 
diagnosis, such as "smoking," "lung cancer," "bronchitis," "dyspnoea," "cough" etc. If a 
new patient, who is coughing and non smoker, comes, the probability of being bronchitic is 
the marginal probability of the graphical model with fixed observed variables. This example 
is taken from a book by Cowell et al [30] and is called CH-ASIA. 

Moreover, there are many general computational problems that are reduced to com- 
putations of the partition functions. Indeed, the counting problems of perfect matchings, 
graph colorings and SAT are equivalent to evaluating the partition functions of certain 
class of graphical models. Computation of the permanent of a matrix is also translated into 
the partition function of a graphical model on a complete bipartite graph. The partition 
function of the perfect matching problem will be discussed in Section f6. 41 

1.1.4 Approximation methods 

Because of the computational difficulty, problem settings in the language of graphical models 
are useful only if such a formulation is combined with efficient algorithms. In this subsection, 
we list approximation approaches except for the loopy belief propagation algorithm, which 
is comprehensively discussed in the next section. 

Mean field approximation 

One of the simplest approximation scheme is the (naive) mean field approximation [96J. For 
simplicity, let us consider the binary pairwise model Eq. (|1.5|) on a graph G = (V,E). Let 
mi = E[a;j] be the mean. We approximate the partition function by replacing the state of 
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the nearest neighbor variables by its mean: 




From E[xj] 



dlogZ 



we obtain constraints called self consistent equation. 



m = 



tanh( Jij m j + hi). 
jeNi 



(1.7) 



The solution of this equation gives an approximation of the means. 

This approximation is also formulated as a variational problem |71j . Let p be the 
probability distribution in Eq. (|1.5p and let q be a fully decoupled distribution with means 



with respect to q. One observes that the condition j^-D(q\ \p) = is equivalent to Eq. (|1.7p . 
Therefore, this variational problem is equivalent to the mean field approximation method. 

Empirically, this approximation gives good results especially for large and densely con- 
nected graphical models with relatively weak interactions [961 166] . However, the full fac- 
torization assumption Eq. (jl.8p is often too strong to capture the structure of the true 
distribution, yielding a poor approximation. One approach for correction is the structured 




(1.8) 



The variational problem is the minimization of the Kullback-Leibler divergence 
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mean field approximation |102| . which extends the region of variation to a sub-tree struc- 
tured distributions, keeping computational tractability [122] . 



Randomized approximations 

Another lines of approximation is randomized (or Monte Carlo) methods. For the computa- 
tion of a marginal probability distribution, one can generate a stochastic process that con- 
verges to the distribution [41J. The partition function can also computed by sampling. For 
ferromagnetic (attractive) case (Jy > 0), the partition function of the Ising model Eq. f)1.5|) 
is accurately approximated in a polynomial time [BH]. More precisely the algorithm is a 
fully polynomial randomized approximation scheme (FPRAS). One major disadvantage of 
these methods is that these are often too slow for practical purposes [71j . In this thesis, we 
do not focus on such randomized approaches. 



1.2 Loopy belief propagation 

1.2.1 Introduction to (loopy) belief propagation 

Though the evaluation of marginal distributions and the partition function are intractable 
tasks in general, there is a tractable class of graph structure: tree. A graph is called a tree 
if it is connected and does not contain cycles. In 1982, Judea Pearl proposed an efficient 
algorithm for calculation of marginal distributions on tree structured models, called Belief 
Propagation (BP) [98l [99] . Roughly speaking, the belief propagation is a message passing 
algorithm, i.e. a message vector is associated with each direction of an edge and updated by 
local operations. Since these local operations can be defined irrespective of the global graph 
structure, BP algorithm is directly applicable to graphical models with cycles. This method 
is called the Loopy Belief Propagation (LBP), showing empirically successful performance 
[89]. Especially, the method is good for sparse graphs, which do not have short cycles. 

Here we simply explain operations of the (loopy) belief propagation algorithm. First let 
us consider a pairwise binary model Eq. fll.5|) on a tree in Figure [L2l We write *$>ij(xi, Xj) = 
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) and = exp(TtjXj). Then a marginal distribution p 2 is given by 



P2 {X 2 ) OC *12#23*34*35#1*2#3*4#E 

X1,X3,X4,X$ 



--^2 f ^ ^ 12 ^! ) ( ^23^3 I Yj ^34^4 I ( ^ ^35^5 I I • 

\ XI / \ X3 \ X4 / \ X 5 / / 



;i.9) 



In the above equality, we used the commutativity and the distributive law of the sum and 
the product. If we define messages by 

"1^2(^2) := ^^12(^1,^2)^1(^1) 

XI 

m^ 3 (x 3 ) := ^^34(^3,^4)^4(^4) 
^5^3(^3) := ^^35(^3,^5)^5(^5) 



.t'r> 



and 

^3^2(^2) := ^ ^23(^2, x 3 )^ 3 (x 3 )m 4 ^ 3 (x 3 )m 5 ^ 3 (x 3 ), 

X3 

Eq. (|l,9p becomes 

p 2 {x 2 ) oc ^2(x 2 )m 1 ^ 2 (x2)m 3 ^2(x2)- 
The partition function is also computed using messages; 

Z = ^(a^mi^O^)"^^^)- (1-10) 

xi 

Obviously, this method is applicable to arbitrary trees; it is called the belief propagation 
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algorithm. More formally, define messages for all directed edges (J — > i) inductively by 

mj-n(xj) = y2^fi(xi)^ij(xi,Xj) ] J mk->j(xj), (1.11) 

where Nj is the neighboring vertices of j. Since we are considering a tree, this equation 
uniquely determines the messages. The marginal distribution of Xi and the partition func- 
tion are given by 

Pi(xi) oc ^i(xi) Y\ mj-nfa), (1.12) 

Z = J2 *i(a?i) II m ^ii x i)- (1-13) 
xi jeN t 

These computations requires only 0(|S|) steps. Therefore, the marginals and the partition 
function of a graphical model associated with a tree can be computed in practical time. 

Secondly, let us consider the case that the underlying graph is not a tree. In this case, 
Eq. (jl.lip does not determine the messages explicitly. However, we can solve Eq. (jl.lip and 
obtain a set of messages as a solution. Though this equation has possibly many solutions, we 
take one solution that is obtained by iterative applications of Eq. (jl.lip . In other word, we 
use the equation as an update rule of the messages and find a fixed point. Then, at a fixed 
point, the approximation of a marginal distribution is given by Eq. (|1.12p . This method is 
called the loopy belief propagation. The approximation for the partition function is slightly 
involved; we will explain it in the next subsection. 

1.2.2 Variational formulation of LBP 

At first sight, the loopy belief propagation looks groundless because it is just a diversion 
of the belief propagation, which is guaranteed to work only on trees. However, Yedidia 
et al [135J have shown the equivalence to the Bethe approximation, making the algorithm 
on a concrete theoretical ground. More precisely, the LBP algorithm is formulated as a 
variational problem of the Bethe free energy function. 

Again, we explain the variational formulation in the case of the model Eq. f)1.5|) . Let 
b = {bij-,bi}ij^E,i&v be a set of pseudomarginals, i.e., functions satisfying ^ bij(xi,Xj) = 
bj( x j)> Ylxi Xj bij(xi,Xj) = 1 and bij{x. L ,Xj) > 0. The Bethe free energy function is defined 
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on this set by 



F(b) 



- ^ bij{.Xi,Xj)\og^ij(xi,Xj) - ^2^bi(xi) log^ j(xj) 




^(l-dO^&iCzOlog&iCsi), (1.14) 




where G?j = |A^| is the number of neighboring vertices of i. Note that this function is 
not convex in general and possibly has multiple minima. The result of [135] says that the 
stationary points of this problem correspond to the solutions of the loopy belief propagation. 
(The positive definiteness of the Hessian of the Bethe free energy function will be discussed 
in Section [4.31 The uniqueness of LBP fixed point will be discussed in Chapter [5]) 

Similar to the case of the mean field approximation, this variational problem can be 
viewed as a KL-divergence minimization [136J, i.e., if we take (not necessarily normalized) 
distribution 



then D{q\\p) ~ F{b) + logZ. Since we are expecting the KL-divergence is nearly zero at a 
stationary point b* , this relation motivates to define the approximation Zb of the partition 
function Z by 



1.2.3 Applications of LBP algorithm 

Since LBP is essentially equivalent to the Bethe approximation, its application dates back 
to the 1930's when Bethe invented the Bethe approximation [TS]. In 1993, Berrou et al |13j . 
proposed a novel method of error correcting codes and found its excellent performance. This 
algorithm was later found to be a special case of LBP by McEliece and Cheng [ST]. This 
discovery made the LBP algorithm popular. Soon after that, the LBP algorithm is success- 
fully applied to other problems including computer vision problems and medical diagnosis 
[39, 89j. Since then, scope of the application of the LBP algorithm is expanding. For exam- 
ple, LBP has many application in image processing such as super-resolution [19] . estimation 
of human pose [59] and image reconstruction [117] . Gaussian loopy belief propagation is 
also used for solving linear equations [105] and linear programming [16] . 




(1.15) 



logZ B :=-F(b*). 



(1.16) 
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1.2.4 Past researches and our approaches 

In this subsection, we review the past theoretical researches on LBP, motivating further 
analysis. A large number of researches have been performed by many researchers to make 
better understanding of the LBP algorithm. 

Behavior of LBP is complicated in general in accordance with non-convexity of the Bethe 
free energy. LBP has possibly many fixed points, and furthermore, may not converge. For 
discrete variable models, because of the lower-boundedness of Bethe free energy, at least 
one fixed point is guaranteed to exist [136] . Fixed points are not necessarily unique in 
general, but for trees and one-cycle graphs, the fixed point is guaranteed to be unique [128 . 
This fact motivates analysis on classes of graphs that have a unique fixed point. Each LBP 
fixed point is a solution of a nonlinear equation associated with the graph. Therefore, the 
problem of the uniqueness of LBP is the uniqueness of the solution of this equation. In 
the next section we discuss the history of this kind of problems in mathematics to show an 
alternative origin of our research. 

As mentioned above, the algorithm does not necessarily converge and often shows os- 
cillatory behaviors. Concerning the discrete variable model, Mooij [87] gives a sufficient 
condition for convergence in terms of the spectral radius of a certain matrix related to the 
graph geometry. This matrix is the same as the matrix that appears in the (multivariate) 
Ihara-graph zeta function jllOj . The graph zeta function is a popular characteristic of a 
graph; it is originally introduced by Ihara [60j. Mooij 's result has not been considered from 
the view of the graph zeta function nor graph geometry. In this thesis, developing a new 
formula, we show a partial answer why this matrix appears in the sufficient condition of 
convergence. 

The approximation performance has been also a central issue for understanding empirical 
success of LBP. Since the approximation of marginals for a discrete model is also an NP-hard 
problem [31j, it seems difficult to obtain high quality error bounds. Therefore, rather than 
rigorous bounds, we need to develop intuitive understanding of errors. For binary models, 
Chertkov and Chernyak [241 [25] derived an expansion called loop series that expresses the 
ratio of Z and its Bethe approximation in a finite sum labeled by a set of subgraphs called 
generalized loops. We also derive an expansion of marginals in a similar manner. An 
interesting point of the loop series is that the graph geometry explicitly appears in the error 
expression and non-existence of generalized loop in a tree immediately implies the exactness 
of the Bethe approximation and LBP. Concerning the error of marginals, Ikeda et al |64[ 163] 
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have derived perturbative analysis of marginals based on the information geometric methods 
[4j. For Gaussian models, though the problem is not NP-hard, Weiss and Freeman have 
shown that the approximated means by LBP are exact but not for covariances |129| . 

For understanding of LBP errors, we follow the loop series approach initiated by Chertkov 
and Chernyak. One reason is that the full series is easy to handle because it is a finite sum. 
Though the expansion is limited to binary models, it covers important applications such as 
error correcting codes. 

As discussed in the previous subsection, LBP is interpreted as a minimization problem, 
where the objective function is the Bethe free energy. Empirically, Yedidia et al j!35j 
found that locally stable fixed points of LBP are local minima of the Bethe free energy 
function. For discrete models, Heskes [SI] has shown that stable fixed points of LBP are 
local minima of the Bethe free energy function. This fact suggests that LBP finds a locally 
good stationary point. From a theoretical point of view, this relation suggests that there is 
a covered relation between the LBP update and the local structure of the Bethe free energy. 

Analysis of the Bethe free energy itself is also an important issue for understanding of 
LBP. Pakzad et al [95] have shown that the Bethe free energy is convex if the underlying 
graph is a tree or one-cycle graph. But for general graphs, (non) convexity of the Bethe 
free energy has not been comprehensively investigated. As observed from Eq. (|1.14|) . the 
Hessian (the matrix of second derivatives) of the Bethe free energy does not depend on the 
given compatibility function, i.e., only determined by the graph geometry. 

The variational formulation naturally derives an extension of the LBP algorithm called 
Generalized Belief Propagation (GBP) that is equivalent to an extension of the Bethe ap- 
proximation: Kikuchi approximation |135t 172] . Inspired by this result, many modified 
variational problems have been proposed. For example, Wiegerinck and Heskes [133J have 
proposed a generalization of the Bethe free energy by introducing tuning parameter in coeffi- 
cients. This free energy yields fractional belief propagation algorithm. Since these extended 
variational problems include the variational problem of the Bethe free energy function, it is 
still important to understand the Bethe free energy as a starting point. 

Finally, we summarize our approach to analysis of LBP motivated by past researches. 
For tree structured graphs, LBP has desirable properties such as the uniqueness of solu- 
tion, exactness and convergence at finite step. However, as observed in past researches, 
existence of cycles breaks down such properties. Organizing these fragmented observations, 
our analysis tries to make comprehensive understanding on the relation between the loopy 
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belief propagation and graph geometry. Indeed past researches of LBP have not treated 
"graph geometry" in a satisfactory manner; few analysis has derived clear relations going 
beyond tree/not tree classification. Malioutov et al [79] have shown that errors of Gaussian 
belief propagation are related to walks of the graph and the universal covering tree, but it 
is limited to the Gaussian case. 

1.3 Discrete geometric analysis 

In this thesis, we emphasize the discrete geometric viewpoint, which utilizes graph charac- 
teristics such as graph zeta function and graph polynomials. First, we introduce another 
mathematical background of this thesis: the interplay between geometry and equation. This 
viewpoint puts our analysis of LBP in a big stream of mathematics. Then, we discuss what 
kind of discrete geometry we should consider. 

1.3.1 Geometry and Equations 

The fixed point equation of the LBP algorithm Eq. (jl.lip involves messages. The messages 
are labeled by the directed edges of the graph and satisfy local relations. Therefore the 
structure of the equation is much related to the graph. Since it is an equation, it is natural 
to ask whether there is a solution. And if so, how many are there and what kind of structure 
do they have? As mentioned in the previous section, if the underlying graph is a tree or 
one-cycle graph, the uniqueness of the LBP solution is easily shown [99^ 1128] . 

Equations that have variables labeled by points in a geometric object often have ap- 
peared in mathematics and formed a big stream |121| . There are many examples that 
involve deep relations between the topology of a geometric object and the properties of 
equations on it such as solvability. In this thesis, we emphasize this aspect of the LBP 
equation and add a new example of this story. 

Here we explain such an interplay by elementary examples. We can start with the 
following easy observation. 



x = 2x + 3y + z + 3 



x = 2x + 3y + z + l 



y = -y + z + z 



(B){ 



y = x - y + z + 2 + 1 



(1.17) 



z 



= 2z-l 



z = —x + y + 2z + 3 



v 
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Figure 1.3: Graph representation of the equations 



One may immediately find the solution of Eq. (A), by computing z,y and x successively. 
But one may not find a solution of Eq. (B) without paper and a pencil. The difference is 
easily realized by a graphical representation of these equations (Fig. \1.3\i . The first one does 
not include a directed cycle, i.e. a sequence of directed edges that ends the starting point, 
but the latter has. 

This type of difference is understood by the following setting. Consider a linear equa- 
tion x = Ax + c. If A is an upper diagonal matrix, this equation is solved in order of 
0(N 2 ) computations by solving one by one. However, for general matrix A, the required 
computational cost is 0(N 3 ) [35]. Existence of directed cycles makes the equation difficult. 

Another easy example of interrelation is the Laplace equation; 



Here, the "geometric object" is the region fi C M n . This differential equation is characterized 
by the variational problem of the energy functional: 



This characterization reminds us the relation between the LBP fixed point equation and 
the variational problem of the Bethe free energy functional. In this analogy, the region 
f2 corresponds to the graph G = (V, E) and the function cp corresponds to messages, or 
equivalently pseudomarginals, on G. The variation of the energy functional and the Bethe 



The equation V 2 (p(x) = is local because it is a differential equation. Similarly, the LBP 
fixed point equation is also local in a sense that it only involves neighboring messages. An 



V 2 (j)(x) = <j){x) =0 xedn. 



(1.18) 




(1.19) 



free energy gives equation V 2 



(j)(x) = and the LBP fixed point equation, respectively. 
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important difference is that the geometric object G is discrete. 

The space of solution is much related to the geometry of Q. For example let n = 2 and 
D\ = {(x,y)\x 2 + y 2 < 1} and D 2 = {{x,y)\l < x 2 + y 2 }. If = D\, the only solution is 
= from the maximum principle [2] . But if Q = D 2 , there is a nonzero solution 

<t>(x,y) = ln(x 2 + y 2 ). 

The Laplace equation can be generalized to be defined on Riemannian manifolds. The 
spectrum of the Laplace operator is intimately related to the zeta function of the manifold 
which is defined by a product of prime cycles [114] . Furthermore, there is an analogy 
between Riemannian manifolds and finite graphs, and the graph zeta function is known as 
a discrete analogue of the zeta function of Riemannian manifold [lj. The spectrum of a 
discrete analogue of the Laplacian is investigated by [28J . 

It is noteworthy that the LBP fixed point equation is a non-linear equation though 
aforementioned examples are linear equations. Analysis of LBP does not reduce to finite 
dimensional linear algebra, e.g., eigenvalues and eigenvectors. This fact potentially produces 
a new aspect of analysis on graphs compared to linear algebraic analysis on graphs [43|, 136]. 

1.3.2 What is the geometry of graphs? 

Let us go back to the question: what kind of discrete geometry should we employ to 
understand LBP and the Bethe approximation? We have to think of graph quantities that 
are consistent with properties of LBP and the partition function. In other words, if there 
is some theory that relates the graph geometry and properties of the partition function 
and LBP, they must share some common properties. Such requirements give hints to our 
question. 

One may ask for a hint of topologist. A graph G = (V, E) is indeed a topological space 
when each edge is regarded as an interval [0, 1] and they are glued together at vertices. 
However, the basic topology theory can not treat rich properties of the graph, because 
it can not distinguish homotopy equivalent spaces and the homotopy class of a graph is 
only determined by the number of connected components k{G) and the nullity n{G) = 
\E\ — \V\ — k(G). In this sense, graph theory is not in a field of topology but rather a 
combinatorics [80] . 

For the computation of the partition function, the nullity is much related to its difficulty. 
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Let K be the number of states of each variable. We can compute the partition function by 
j^n(G) sumS) because if we cut n(G) edges of G, we obtain a tree. But the partition function 
on a bouquet graph, i.e. a graph that has one vertex with multiple edges, is easily computed 
in K steps. Therefore, for the understanding of the computation and the behavior of LBP, 
we need more detailed information of graph geometry that distinguishes graphs with the 
same nullity. 

Therefore, we should ask for graph theorists. Graph theory has been investigating graph 
geometry in many senses. There are many graph characteristics, which are invariant with 
respect to graph isomorphisms [34} [35] . The most famous example is the Tutte polynomial 
[120J, which plays an important role in graph theory, a broad field of mathematics and 
theoretical computer science. However, this thesis does not discuss the Tutte polynomial 
because it does not meet criteria discussed below. 

In the LBP equation, one observes that vertices of degree two can be eliminated without 
changing the structure of the equation, because this is just a variable elimination. On the 
other hand, for the problem of computing the true partition function, one also observes that 
vertices of degree two can be eliminated with low computational cost keeping the partition 
function. 

The operation of eliminating edges with a vertex of degree one also keeps the problem 
essentially invariant, i.e., LBP solutions are invariant under the operation and the true 
partition function does not change up to a trivial factor. 

In this thesis, we consider two objects associated with the graph: graph zeta function 
and polynomial. We use them in a multivariate form; in other words, we define them 
on (directed) edge-weighted graphs. These quantities are desired properties consistent with 
the above observations. That is, 

1. "invariant" to removal of a vertex of degree one and the connecting edge. 

2. "invariant" to erasure of a vertex of degree two. 

For the second property, we need to explain more. Assume that there is a vertex j of degree 
two and its neighbors are i and k. If we have directed edge weights u^j and we can 

erase the vertex i taking = Uj^iUi^j keeping the graph zeta function invariant. A 

similar result holds for the G polynomial though its weights are associated with undirected 
edges. 
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The first property immediately implies that these quantities are in some sense trivial if 
the graph is a tree. This property reminds us that LBP gives the exact result if the graph 
is a tree. 

Indeed, these properties do not uniquely determine the quantities associated to graphs. 
However, it gives a clue to answer our question: what kind of graph geometry is related to 
LBP? 

1.4 Overview of this thesis 

The remainder of the thesis is organized in the following manner. 
Chapter 2: Preliminaries 

This chapter sets up the problem formally, introducing hypergraphs, graphical models and 
exponential families. The Loopy Belief Propagation (LBP) algorithm is also introduced 
utilizing the language of the exponential families. Our characterizations of LBP fixed points 
gives an understanding of the Bethe-zeta formula, as discussed in the first section of Chapter 

H 

Part I: Graph zeta in Bethe free energy and loopy belief propagation 

The central result of this part is the relation between the Hessian of the Bethe free en- 
ergy and the multivariate graph zeta function. The multivariate graph zeta function is 
a computable characteristic of an edge weighted graph because it is represented by the 
determinant of a matrix indexed by edges. 

The focus of this part is mainly an intrinsic nature of the LBP algorithm and the Bethe 
free energy. Namely, we do not treat the true partition function and the Gibbs free energy. 
Interrelation of such exact quantities and their Bethe approximations is discussed in the 
next part. 

The contents of this part is an extension of the result in [126] where only pairwise and 
binary models are discussed. 

Chapter 3: Graph zeta function 

This chapter develops the graph zeta function and related formulas. First, we introduce 
our graph zeta function unifying known types of graph zeta functions. Secondly, we show 
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the Ihara-Bass type determinant formula which plays an essential role in the next chapter. 
Some basic properties of the univariate zeta function, such as places of the poles, are also 
discussed. 

Chapter 4: Bethe-zeta formula 

This chapter presents a new formula, called Bethe-zeta formula, which establishes the re- 
lation between the Hessian of the Bethe free energy function and the graph zeta function. 
This formula is the central result in Part I. The proof of the formula is based on the Ihara- 
Bass type determinant formula and Schur complements of the (inverse) covariance matrices. 
Demonstrating the utility of this formula, we discuss two applications of this formula. The 
first one is the analysis of the positive definiteness and convexity of the Bethe free energy 
function; the second one is the analysis of the stability of the LBP algorithm. 

Chapter 5: Uniqueness of LBP fixed point 

This chapter develops a new approach to the uniqueness problem of the LBP fixed point. 
We first establish an index sum formula and combine it with the Bethe-zeta formula. Our 
main contribution of this chapter is the uniqueness theorem for unattractive (frustrated) 
models on graphs with nullity two. Though these are toy problems, the analysis exploits the 
graph zeta function and is theoretically interesting. This chapter only discusses the binary 
pairwise models but our approach can be basically generalized to multinomial models. 

Part II: Loop Series 

In this part, focusing on binary models, we analyze the relation between the exact quantity, 
such as the partition function and marginal distributions, and their Bethe approximations 
using the loop series technique. The expansion provides graph geometric intuitions of LBP 
errors. 

Chapter 6: Loop series 

Loop Series (LS), which is developed by Chertkov and Chernyak [24, 25], is an expansion 
that expresses the approximation error in a finite sum in terms of a certain class of sub- 
graphs. The contribution of each term is the product of local contributions, which are easily 
calculated by the LBP outputs. First we explain the derivation of the LS in our notation, 
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which is suitable for the graph polynomial treatment in the next chapter. In a special case 
of the perfect matching problems, we observe that the loop series has a special form and is 
related to the graph zeta function. We also review some applications of the loop series. 

Chapter 7: Graph polynomials from Loop Series 

This chapter treats the loop series as a weighted graph characteristics called theta poly- 
nomial, 0g (/3, 7). Our motivation for this treatment is to "divide the problem in two 
parts." The loop series is evaluated in two steps: 1. the computation of (3 = (Pij)ijeE and 
7 = (7i)iev by an LBP solution; 2. the summation of all subgraph contributions. Since the 
first step seems difficult, we focus on the second step. If there is an interesting property in 
the form of the sum, or the ©-polynomial, the property should be related to the behavior 
of the error of the partition function approximation. 

Though we have not been successful in deriving properties of O-polynomial that can be 
used to derive properties of the Bethe approximation, we show that the graph polynomials 
Og(PiI) an d w g(/5), which are obtained by specializing Qq, have interesting properties: 
deletion-contraction relation. We also discuss partial connections to the Tutte polynomial 
and the monomer-dimer partition function. We believe that these results give hints for 
future investigations of ©-polynomial. 

Chapter 8: Conclusion 

This chapter concludes this thesis and suggests some future researches. 
Appendix 

In Appendix A, we summarize useful mathematical formulas, which are used in proofs of 
this thesis. In Appendix B, we put topics on LBP which are not necessary for the logical 
thread of this thesis, but helpful for further understandings. 
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In this chapter, we introduce objects and methods studied in this thesis. Probability distri- 
butions that have "local" factorization structures appear in many fields including physics, 
statistics and engineering. Such distributions are called graphical models. Loopy Belief 
Propagation (LBP) is an efficient approximation method applicable to inference problems 
on graphical models. The focus of this thesis is an analysis of this algorithm applied to any 
graph-structured distributions. We begin in Section 12.11 with elements of hypergraphs as 
well as graphical models because the associated structures with these graphical models are, 
precisely speaking, hypergraphs. Section 12.21 introduces the LBP algorithm on the basis 
of the theory of exponential families. A collection of exponential families, called inference 
family, is utilized to formulate the algorithm. The Bethe free energy, which gives alter- 
native language for formulating the approximation by the LBP algorithm, is discussed in 
Section [2.31 providing characterizations of LBP fixed points. 



2.1 Probability distributions with graph structure 

Probability distributions that are products of "local" functions appears in a variety of fields, 
including statistical physics [100} 02] , statistics [132] , artificial intelligence [99] , coding the- 
ory |81|. 1751 140] . machine learning [70], and combinatorial optimizations [82J. Typically, 
such distributions come from system modeling of random variables that only have "local" 
interactions/constraints. These factorization structures are well visualized by graph repre- 
sentations, called factor graphs. Furthermore, the structures are cleverly exploited in the 
algorithm of LBP. 
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Figure 2.1: Hypergraph H. Figure 2.2: Two representations. 

We start in Subsection 12.1.11 with an introduction of hypergraphs because factor graphs 
are indeed hypergraphs. Further theory of hypergraphs is found in [T2]. Subsection 12.1.21 
formally introduces the factor graph of graphical models with some examples. 

2.1.1 Basic definitions of graphs and hypergraphs 

We begin with the definition of (ordinary) graphs. A graph G = (V, E) consists of the vertex 
set V joined by edges of E. Generalizing the definition of graphs, we define hypergraphs. A 
hypergraph H = (V, F) consists of a set of vertices V and a set of hyperedges F. A hyperedge 
is a non-empty subset of V. Fig. 12.11 illustrates a hypergraph H = ({1, 2, 3}, {ot\ , ct2, 03}), 
where ot\ = {1, 2}, ai = {1, 2, 3, 4} and 013 = {4}. 

In order to describe the message passing algorithm in Section 12.2.31 it is convenient 
to identify a relation i £ a with a directed edge a — > i. The left of Fig. 12.21 illustrates 
this representation of the above example, where squares represent hyperedges. Therefore, 
explicitly writing the set of directed edges E, a hypergraph H is also denoted by H = 
(VUF,E), 

It is also convenient to represent a hypergraph as a bipartite graph. A graph G = (V, E) 
is bipartite if the vertices are partitioned into two set, say V\ and V2, and all edges join the 
vertices of V\ and V2. A hypergraph H = (V U F, E) is identified with a bipartite graph 
Bh = (V U F,Eb h ), where Eb h is obtained by forgetting the directions of E. (See the 
right of Fig.E2J) 

For any vertex i 6 V, the neighbors of i is defined by N{ := {a £ F\i G a}. Similarly, 
for any hyperedge a G F, the neighbors of a is defined by N a := {i S V\i £ a} = a. The 
degrees of i and a are given by dj := |iVj| and d a := |iV a | = \a\, respectively. A hypergraph 
H = (V, F) is called (a, b)-regular if di = a and d a = b for all % E V and a E F. If all the 
degrees of hyperedges are equal to two, a hypergraph is naturally identified with a graph. 
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Figure 2.3: Core of hypergraph in Fig. 12.21 

A walk W = (ii,ai,i2, ■ ■ ■ ,a n , i n +i) of a hypergraph is an alternating sequence of ver- 
tices and hyperedges that satisfies at D {ik,ik+i}, ik ^ ik+i for k = 1, . . . , n. We say that 
W is a walk from i\ to i n +i and has length n. A walk W is said to be closed if i\ = i n +i- 
A cycle is a closed walk of distinct hyperedges. 

A hypergraph H is connected if for every pair of distinct vertices i, j there is a walk 
from i to j. Obviously, a hypergraph is a disjoint union of connected components. The 
number of connected components of H is denoted by k(H). The nullity of a hypergraph H 
is defined by n(H) := \V\ + \F\ - \E\. 

Definition 1. A hypergraph H is a tree if it is connected and has no cycle. 

This condition is equivalent to n(H) = and k(H) = 1. Other characterization will be 
given in Propositions 12. ll and 13.21 Note that this definition of tree is different from hypertree 
known in graph theory and computer science |104|, WI\ . For example, the hypergraph in 
Fig 12.11 is a hypertree though it is not a tree in our definition. 

Core of hypergraphs 

Here, we discuss the core of hypergraphs, which gives another characterization of trees. 

The cor^\ of a hypergraph H = (V,F), denoted by core(-ff), is a hypergraph that is 
obtained by the union of the cycles of H. In other words, core(H) = (V',F') is given 
by F' = {a £ F\a is in some cycles of H} and V = {i G V\i is in some cycles of H}. A 
hypergraph H is said to be a coregraph if H = core(-ff). See Fig. 12.31 for an example. 

Intuitively, the core of a hypergraph is obtained by removing vertices and hyperedges of 
degree one until there is neither such vertices nor hyperedges. More precisely, the operation 
for obtaining the core is as follows. First, for H = (VL)F, E), find a directed edge (a — > i) € 

x This term is taken from |109| where the core of graphs is defined. Note that this notion is different from 
the core in [43] . 
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E that satisfies d a = 1 or d% = 1. If both of the condition is satisfied, remove a, i and (a — > 
i). If either of them is satisfied, remove (a — > i) and the degree one vertex/hyperedge. 
The following characterization of tree is trivial from the above definitions. 

Proposition 2.1. A connected hypergraph H is a tree if and only if core(H) is the empty 
hypergraph. 

2.1.2 Factor graph representation 

Our primary interest is probability distributions that have factorization structures repre- 
sented by hypergraphs. 

Definition 2. Let H = (V,F) be a hypergraph. For each % 6 V, let X{ be a variable that 
takes values in Xi. A probability density function p on x = (xj)i g y is said to be graphically 
factorized with respect to H if it has the following factorized form 

p(z) = \ n ( 2 - x ) 

where x a = (xj)jg Q , Z is the normalization constant and ^ a is non- negative valued function 
called compatibility function. A set of compatibility functions, giving a graphically factorized 
distribution, is called a graphical model. The associated hypergraph H is called the factor 
graph of the graphical model. 

We often refer to a hypergraph as a factor graph, implicitly assuming that it is associated 
with some graphical model. For a factor graph, a hyperedge is usually called a factor. Factor 
graph is explicitly introduced in [77] . 

Any probability distribution on X = FJ. X{ is trivially graphically factorized with respect 
to the "one-factor hypergraph," where the unique factor includes all vertices. It is more 
informative if the factorization involves factors of small sizes. Our implicit assumption in 
the subsequent chapters is that for all factors a, X a = JT. Xi is small, in sense of cardinality 
or dimension, enough to be handled by computers. 

A Markov Random Field (MRF) is an example that have such a factorization structure. 
Let G = (V, E) be a graph and X = Y\ i€V Xi be a discrete set. A positive probability 
distribution p of X is said to be a Markov random field on G if it satisfies 

p{xi\xv^i) = p{xi\xNi) for all % £ V. (2.2) 



2.1. PROBABILITY DISTRIBUTIONS WITH GRAPH STRUCTURE 



35 



See, e.g., [73J for further materials. A clique is a subset of vertices every two of which are 
connected by an edge. Hammersley-Clifford theorem says that 

p(x) oc n *c(xc), (2.3) 
cec 

where C is a set of cliques. A proof of this theorem, using the Mobius inversion technique, 
is found in [38]. 

Bayesian networks provide another class of examples of factorized distributions. The 
scope of applications of Bayesian networks includes expert system [30], speech recognition 
[67j and bioinformatics [33]. Consider a Directed Acyclic Graph (DAG), i.e., a directed 
graph without directed cycles. A Bayesian network is specified by local conditional proba- 
bilities associated with the DAG [991 [30]. Namely, it is given by the following product 

P{x) = \\p{xi\x 7T{i) ), (2.4) 

where 7r(z) is the set of parents of i, i.e., the set of vertices from which an edge is incident on 
i. If ir(i) = 0, we take p{xi\$) = p(xi). The factor graph, associated with this distribution, 
consists of factors a = {i} U ir(i). 

We often encounters a situation that the "global constraint" of variables is given as 
a logical conjunction of "local constraints." A product of local functions can naturally 
represent such a situation. For example, in linear codes, a sequence of binary (0 or 1) 
variables x has constraints of the following form, called parity check: 

%ii ffi %i2 © • • • © Xi k = 0, (2-5) 

where © denotes the sum in F2. For a given set of parity checks, a sequence of binary 
variables x is called a codeword if it satisfies all the conditions. A parity check can be 
implemented by a local function ^f a that is equal to zero if x a violates Eq. (|'2.5|) (a = 
{ii, . . . ,ik})- Furthermore, the product of the local functions implies the condition for the 
linear code. Satisfiability problem (SAT), coloring problem and matching problem, etc, 
have the same structure. 
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2.2 Loopy Belief Propagation algorithm 

Belief Propagation (BP) is an efficient method that calculates exact marginals of the given 
distribution factorized according to a tree-structured factor graph |99j . Loopy Belief Prop- 
agation (LBP) is a heuristic application of the algorithm for factor graphs with cycles, 
showing successful performance in various problems. We mentioned examples of applica- 
tions in Subsection 11.2.31 

In this thesis, we refer to a family as a collection of probability distributions. We carefully 
distinguish a family and a model, which gives a single probability distribution. First, in 
Subsection 12.2,11 we introduce a class of families, called exponential families, because an 
inference family, which is needed for the LBP algorithm and introduced in Subsection 12.2.21 
is a set of exponential families. The detail of the LBP algorithm and its exactness on trees 
are described in Subsections 12.2.31 and [272.41 Subsection 12.2.51 derives the differentiation of 
the LBP update at LBP fixed points, which determines the stability of the algorithm. 



2.2.1 Introduction to exponential families 

Exponential families are the simplest and the most famous class of probability distributions. 
Many important stochastic models such as multinomial, Gaussian, Poisson and gamma 
distributions are all included in this class. Here, we provide a minimal theory on exponential 
families. The core of the theory is the Legendre transform of the log partition function 
and the bijective transform between dualistic parameters, called natural parameter and 
expectation parameter. These techniques are exploited especially in the derivation of the 
Bethe-zeta formula in Section [231 More details of the theory about the exponential families 
is found in books O [21] and a composition from the information geometrical viewpoint is 
found in [2]. 

The following definition of the exponential families is not completely rigorous, but it 
would be enough for the purpose of this thesis. 

Definition 3. Let be a set and v be a base measure on it. For given n real valued 
functions (j)(x) = (0i(x), . . . ,(j) n (x)) on X, a parametric family of probability distributions 
on X is given by 



p(x; 



0) = exp (j2 ~ ^(0)) , := log j exp (j^ Oi<k( x ) \ M*) 



2.2. LOOPY BELIEF PROPAGATION ALGORITHM 



37 



and is called an exponential family. The parameter 9, called natural parameter, ranges over 
the set © := int{0 G R^; J exp(^^ 1 Oi<fii(x))dv < oo}, where int denotes the interior of the 
set. The function cf>(x) is called the sufficient statistic and t/j(0) is called the log partition 
function. 

An afhne transform of the natural parameters gives another exponential family; we 
identify it with the original family. 

It is known that one can differentiate the log partition function at any number of times 
by interchanging differential and integral operations [21] . One easily observes that B is a 
convex set and ip(0) is a convex function on it. Actually, the convexity of © is derived from 
the convexity of the exponential function. The Hessian of ip 

^- = Cov Pe [^,^] i,j = l,...,N (2.6) 

is obviously positive semidefinite and thus ip is convex. 

In this thesis, we require the following regularity condition for exponential families. 

Assumption 1. The N by N matrix Eq. (|2.6[) is positive definite. 



Legendre transform 

The heart of the theory of exponential family is the duality coming from the Legendre 
transform, which is applicable to any convex function and derives the dual parameter set. 
A comprehensive treatment of the theory of the Legendre transform is found in |20j. 

First, we introduce a transform of the natural parameter to the dual parameter. For the 
sufficient statistics cf), let supp0 be the minimal closed set S C M N for which v(cf)~ l (M. N \ 
S)) = 0. The dual parameter set, called the expectation parameters, is defined by Y := 
int(conv(supp0)). Obviously, Y is an open convex set. If X is a finite set, Y is explicitly 
expressed as follows: 

Y = i a ^ x )\ ^ a x = l,a x > 0}, 

x&X' x<^X> 

where X' = {x G X\v{{x}) > 0}. 

The following theorem is the fundamental result establishing the transform to this dual 
parameter set. 
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Theorem 2.1 ([21]). A map 

A : 9 3 

is a bijection. 

Proof. We only prove the injectivity of the map A. Take distinct points and 0' in 0. 
Define 

f(t):=(0' -0,A(0 + t(0' -0))) t e [0,1], (2-7) 

where (■,•) is the standard inner product. Since the covariance matrix is positive definite 
from Assumption [H f(t) is strictly increasing. Therefore, 

/(1)-/(O) = (0'-0,A(0')-A(0))>O. 

This yields A(0') ^ A(0). 

The proof of the surjectivity is found in Theorem 3.6. of |21| . □ 

The map A, which is referred to as a moment map, is also written as the expectation of 
the sufficient statistic A(6) = E Pe [0]. 

The Legendre transform of ip(0) on 0, which gives a convex function on the dual pa- 
rameter set, is defined by 

¥>fa) = sup((0,tj}-^(0)), r,eY, (2.8) 
oee 

where (0,tj) = Yli^iVi * s the inner product. This function is convex with respect to r), 
because it is a supremum of linear functions. Since the expression in the supremum in 
Eq. (|2.8|) is concave with respect to 6, the supremum is uniquely attained at 6 (rj) that 
satisfies rj = A(6(rj)). This equation implies that a map r/ \— > (rj) is the inverse of A. Note 
that ip is actually a negative entropy 

^) = V n{n) [logp Kv) }. (2.9) 

Note also that derivative of (p gives the inverse of the map A, i.e. ^{v) = ^ 1 ( r l)j which is 
easily checked by the differentiation of the equation (p(rj) = {0(r)),rj) — tp(0(rj)). Therefore, 
the Hessian of cp is the inverse of the covariance matrix and thus tp is a strictly convex 



8 ^(0\ 



G Y 
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function. 



The inverse transform of Eq. (|2.8[) is obtained by an identical equation 



^(0) = sup ((fl,i?> -¥>(»?)) 



6 9, 



(2.10) 



because the supremum in Eq. (pTLOl) is attained at r)(0) that satisfies = A" 1 (r)(0)). 

In summary, strictly convex functions ip and ip are the Legendre transform of each other 
and the natural parameters and the expectation parameters are transformed by A and A -1 , 
which are given by the derivatives of the functions. 

Examples of exponential families 

Example 1 (Multinomial distributions). Let X = {1, . . . , N} be a finite set with the 
uniform base measure. We define sufficient statistics as 



for k = 1, . . . , N — 1. Then the given exponential family is called multinomial distributions 
and coincides with the all probability distributions on X that have positive probabilities for 
all elements of X . 

By definition, the region of natural parameters is G = l^ -1 . The region of expectation 
parameters is the interior of probability simplex. That is, 



Example 2 (Gaussian distributions). Let X = M. n with the Lebesgue measure and let 
<M#i) = Xi and 4>ij(xi j X j ) — X^Xj • The exponential family given by the sufficient statis- 
tics (p(x) = ((pi(xi), (pjk(xj , Xk))i<i< n ,i<j<k<n, is called Gaussian distributions, consists of 
probability distributions of the form 




(2.11) 




(2.12) 



k=l 




(2.13) 
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If we set Jij = Jji = —6ij (i / j), Ja = —26a and hi = 6i, it comes to 

p(x; 0) = exp ( - ^x T Jx + h T x - ip(0)) , (2.14) 
ij){e) = | log 2vr - - log det J + hi T J^h. (2.15) 

Obviously, the set of natural parameters is = {6 € J is positive definite.}, where 
N = n + n g" 1 ^ ■ As is well known, the mean and covariance of x are given by /1 = J^ 1 h 
and £ = J" 1 , respectively. The transform to the dual parameter is given by the expectation 
of sufficient statistic: A(9) = (/Xj, Sjj + Therefore, the set of expectation parameters 

is Y = {rj £ m N \T,(r)) := (r)%j — ViV^i^iJ^n is positive definite.}. The dual convex function 
if is 

Tl I 

<p(ri) = --(l + log27r) - -logdetS(T7). (2.16) 

For a given mean vector /x = (pi), the fixed-mean Gaussian distributions is the ex- 
ponential family obtained by the sufficient statistics 4>(x) = {(xi — Hi)(xj — Hj)}i<i<j<n- 
Moreover, if fj, = 0, the family is called the zero-mean Gaussian distributions. 

2.2.2 Inference family for LBP 

In this Subsection, we construct a set of exponential families used in the LBP algorithm. 
In order to perform inferences using LBP for a given graphical model, we have to fix a 
"family" that includes the probability distribution. 

Let H = (V, F) be a hypergraph. In succession, we follow the notations in Subsection 
I2.L21 For each vertex i, we consider an exponential family £i with a sufficient statistic 
(f>i [^| and a base measure Vi on Xi. A natural parameter, expectation parameter, the log 
partition function and its Legendre transform are denoted by 6i, rji, ^ and ipi respectively. 
Furthermore, for each factor a = . . . , id a }, we give an exponential family E a on X a = 
riiea ^-i with the base measure u a = Y\ iea Vi and a sufficient statistic 4> a of the form 

0a(#a) = (^W.^K),.--,^^)). (2.17) 

An important point is that (j) a includes the sufficient statistics of i £ a as components. 
The natural parameter, expectation parameter, log partition function and its Legendre 

2 In the previous subsection, we used bold symbols to represent vectors, but from here we simplify the 
notation. 
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transform of this model are denoted by 

G a = (0<a>,0 a :H)--- ^a:i da ) € @a, ??a = (V(a) , Vcc.ii , ■ ■ ■ , Va:i da ) € and <£ a . (2.18) 

In order to use these exponential families £ a and £{ for LBP, we need an assumption. 

Assumption 2 (Marginally closed assumption). For all pair of i G a, 

J p{x a )dv a ^i(x a ^i) G £, for all p G £ a . (2.19) 

Definition 4. A collection of the exponential families X := {£ a ,£i} given by sufficient 
statistics (<fi(a)(xa),<l>i(xi))aieF,iev as above, satisfying Assumptions Q] and [2] is called an 
inference family associated with a hypergraph H. A inference family is called pairwise if 
the associated hypergraph is a graph. 

Inference model has a parameter set = ®o x Eli which is bijectively mapped 
to the dual parameter set Y = Y[ a Y a x \\ i Yi by the maps of respective components. 

An inference model naturally defines an exponential family on X = \\ - Xi of the sufficient 
statistic {4>{a){x a ), <pi(xi)) a ^F,ieV- This exponential family is called the global exponential 
family and denoted by £ (I). 

Example 3 (Multinomial). Let £{ be an exponential family of multinomial distributions. 
Choosing functions <fi a (x a ) suitably, we can make the £ a being multinomial distributions 
on X a . Then the inference family is called a multinomial inference family. 

Example 4 (Gaussian). Let Xi = mJ^I For Gaussian case, the sufficient statistics are given 
by 

<fii(xi) = (Xi,xf), 4>(a)(x a ) = ( x i x j)i,j€a,i^j (2.20) 

Then the inference family X is called Gaussian inference family. Assumption [2] is satisfied 
because a marginal of a Gaussian distribution is a Gaussian distribution. Fixed-mean cases 
are completely analogous. Usually, H is a graph rather than hypergraphs. In this thesis, 
we only consider Gaussian inference families on graphs, but extensions of our results to 
hypergraphs are straightforward. 

Extension to high dimensional case, i.e. Xi — R r % is straight forward. 
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2.2.3 Basics of the LBP algorithm 

The LBP algorithm calculates approximate marginals of the given graphical model \& = 
{^ a } using a fixed inference family I. We always assume that the inference family includes 
the given probability density function. 

Assumption 3. For all factors a G F, there exists 9 a s.t. 

^a{x a ) = exp ((0 a , 4> a {x a ))) (2.21) 
This assumption is equivalent to the assumption 

P {x) = ^l[y a (x a )eS(l) (2.22) 

a 

up to trivial constant re-scalings of \P a , which do not affect the LBP algorithm. 

The procedures of the LBP algorithm is as follows [77]. For each pair of a vertex i £ V 
and a factor a € F satisfying i £ a, an initialized message is given in a form of 

m l^i(xi) = exp((^i,^(xi))), (2.23) 

where the choice of /U^-h i s arbitrary. The set {m^J or {/i^^} is called an initialization 
of the LBP algorithm. At each time t, the messages are updated by the following rule: 

mt a-li( X i) = " J *<*«> IJ II m i^j( X j) d ^i(Xa^i) (t>0), (2.24) 

where a; is a certain scaling constant Q See Fig |2.4l for the illustration of this message update 
scheme. From Assumptions [2] and [3J messages can keep the form of Eq. (|2.23p . 

One may realize that Eq. (|2.24p looks slightly different from Eq. (jl.lip . which involves 
compatibility functions associated with vertices. However, the compatibility functions as- 
sociated with vertices can be included in that of factors and such operations do not affect 
the LBP algorithm essentially. Therefore, our treatment is general. 

Since this update rule simultaneously generates all messages of time t + 1 by that of 
time t, it is called a parallel update. Another possibility of the update is a sequential update, 

4 Here and below, we do not care about the integrability problem. For multinomial and Gaussian cases, 
there are no problems. 
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Figure 2.4: The blue messages contribute to the red message at the next time step. 



where, at each time step, one message is chosen according to some prescribed or random 
order of directed edges. In this paper, we mainly discuss the parallel update. 

We repeat the update Eq. (|2.24p until the messages converge to a fixed point, though 
this procedure is not guaranteed to converge. Indeed, it sometimes exhibits oscillatory 
behaviors [89]. The set of LBP fixed points does not depend on the choices of the update 
rule, but converging behaviors, or dynamics, does depend on the choices. 

If the algorithm converges, we obtain the fixed point messages {m*.^} and fre/ie/J^I that 
are defined by 

bi{ Xi ) := uYlmt+ifa) (2.25) 
b a (x a ) := oo^ a (x a ) Yl ]J m ^i( x i)> ( 2 - 26 ) 

where u denotes normalization constants requiring 

J bi{xi)dui = 1 and J b a (x a )&v a = 1. (2.27) 
Note that beliefs automatically satisfy conditions b a (x a ) > 0, and 

b a (x a )du a ^i(x a ^i) = bi(xi). (2.28) 



Beliefs are used as approximation of the true marginal distributions p a (x a ) and Pi(xi). We 
will give the approximation of the partition function by LBP, called the Bethe approxima- 
tion, in the next section. 



5 Belies are often defined for middle time messages {m^^} by Eqs. (I2.25|) and (|2.26|l . However, in this 
thesis, beliefs are only defined by fixed point messages. 
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2.2.4 BP on trees 

For the understanding of the LBP algorithm, tree is always a good starting point. His- 
torically, the message update scheme of the algorithm is designed to calculate the exact 
marginals of tree-structured distributions and called Belief Propagation (BP). Here, we 
review the fact. 

Proposition 2.2. If H is a tree, the LBP algorithm stops at most \E\ updates and the 
calculated beliefs are equal to the exact marginal distributions of p. 

Proof. We omit a detailed proof. Basically, the assertion is checked by extending observa- 
tions in Subsection 11.2.11 □ 



2.2.5 LBP as a dynamical system 

At each time t, the state of the algorithm is specified by the set of messages {m^^}, which 
is identified with its natural parameters /x* = £ R ■ In terms of the parameters, 

the update rule Eq. (|2.24|) is written as follows. 

^a-li = K 1 { A a(0 (a) ,e a:h + ^ Vp-tii'--- >9cr-ik+ /4->i fc )i) _ ^ /4->i> ( 2 - 29 ) 

where a = {ii, . . . , id a }, d a = k and A a (- ■ ■ )j is the i-th component (i £ a). To obtain this 
equation, multiply Eq. (|2.24|) by 

and normalize it to be a probability distribution. Then take the expectation of fa. 

The update rule can be viewed as a transform T on the set of natural parameters of 
messages M. Formally, 

T : M — > M, = TV' -1 ). 

In this formulation, the fixed points of LBP are {/x* e M\fi* = T(fx*)}. 

In order to get familiar with the computation techniques, here we compute the differen- 
tiation of the update map T around an LBP fixed point. This expression derived in [63} [63] 
for the cases of turbo and LDPC codes. 
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Theorem 2.2 (Differentiation of the LBP update). At an LB P fixed point, the differenti- 
ation (linearization ) of the LBP update is 



dT(fi) a ^i _ J Var 6j [fa] 1 Cov ba [fa, fa] ifjeN a \i and /3 G Nj \ a, 

otherwise. 



Proof. First, consider the case that j £ N a \ i and ft G Nj x a. The derivative is equal to 

Var 6i [&]- 1 Cav&J&,^]. (2.31) 



Another case is £ = j and a,/3 £ Ni (a / /3). Then, the derivative is 

because VarfeJ^j] = Va,ib a [fa] from Eq. ()2.28p . In other cases, the derivative is trivially 
zero. □ 

The relation j 6 N a \ i and ft E Nj n a will be written as (/3 — > j) — 1 (q — > i) in 
Subsection 13.2.11 We will discuss the relations between the differentiation T and stability 
properties of the LBP algorithm in Section 14.41 

It is noteworthy that the elements of the linearization matrix is explicitly expressed by 
the fixed point beliefs. 



2.3 Bethe free energy 

The Bethe approximation was initiated in the paper of Bethe [15] to analyze physical phases 
of two atom alloy. Roughly speaking, the Bethe approximation captures short range fluctu- 
ations computing states in small clusters in a consistent manner. The Bethe approximation 
is known to be exact for distributions on tree-structured graphs. The modern formulation 
for presenting the approximation is a variational problem of the Bethe free energy [B]. In 
the end of this section, we see that this approximation is equivalent to the LBP algorithm. 
This relation was first clearly formulated by Yedidia et al [135 . 

In this section, we introduce two types of Bethe free energy functions, both of them yield 
variational characterization of the Bethe approximation. These two functions are basically 
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similar and have the same values on the points corresponding to the LBP fixed points. The 
first type is essentially utilized to show the equivalence of the Bethe approximation and 
LBP by Yedidia et al [135] . In [B3], Ikeda et al discusses relations between these two types 
of Bethe free energy functions on a constrained set. 

2.3.1 Gibbs free energy function 

First, we should introduce the Gibbs free energy function because the Bethe free energy 
function is a computationally tractable approximation of the Gibbs free energy function. 
For given graphical model $ = the Gibbs free energy Fq^s is a function over the set 

of probability distributions poni = (^i)ieV defined by 

Wp) = /pWlog( n -fyd,, W , (2.33) 

where v = Y\ ieV Vi is the base measure on X = Y\ ieV X{. Since y logy is a convex function 
of y, FQibbs is a convex function with respect to p. Using Kullback-Leibler divergence 
D(q\\p) = f P^og(q/p), Eq. (|2.33|) comes to 

Fcibbsip) = D(p\\p) - log Z. (2.34) 

Therefore, the exact distribution Eq. (|2.ip is characterized by a variational problem 

p(x) = argmmF Gibbs (p), (2.35) 
v 

where the minimum is taken over all probability distributions on x. As suggested from the 
name of "free energy," the minimum value of this function is equal to — log Z. 

From the Assumption [31 p is in the global exponential family S{X). Therefore, it is 
possible to restrict the range of the minimization within £ (I) without changing the outcome 
of the minimization. 

2.3.2 Bethe free energy function 

At least for discrete variable case, computing values of the Gibbs free energy function is 
intractable in general because the integral in Eq. (|2.33p is indeed a sum of \X\ = JT^ \X{\ 
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states. We introduce functions called Bethe free energy that does not include such expo- 
nential number of state sum. 

There are two types of Bethe free energy functions; the type 1 is defined on an affine 
subspace of expectation parameters whereas the second type is defined on an affine subspace 
of natural parameters. In information geometry, such subspaces are called m- affine space 
and e-affine space, respectively [3]. 



Definition 5. The type 1 Bethe free energy function is a function of expectation parame- 
ters. For a given inference family X, a set £(xj^| is defined by L(I) = {ij = {r) a ,rn}; r] a -i = 
iji}. On this set, the Bethe free energy function is defined by 



where 9 a is the natural parameter of \& Q . 

Since Yi and Y a are open convex, L is a relatively open convex set. This function is 
computationally tractable because it is a sum of the order 0(|X| + \ V\) terms, assuming 
the functions (p a and <fi are tractable. 

An element of L is called a set of pseudomarginals. The pseudomarginals can be iden- 
tified with a set of functions {b a (x a ), &i(xj)} Q eF,iev that satisfies 

2. / b a (x a )dv a ^i = bi(xi). 

The second condition is called local consistency. Under this identification, the Bethe free 
energy function is 



Type 1 




(2.36) 



F({b a (x a ),bi(xi)}) = - / & a (x a )log* a (x a )di/ a + ^ / b a (x a ) log 6 Q (x Q )d, 



a 




(2.37) 



6 



For multinomial cases, the closure of this set is called local polytope [1251 1124] , 
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Example 5 (Multinomial inference family). Let X be a multinomial inference family. The 
local polytope is given by 

L= {{b a ,bi} aeFiieV \ b a (x a )>0, ^2b a (x a ) = l, ^ b a (x a ) = bi(xi) y iea}. 

x a x a ^i 

The Bethe free energy function is 

F = - ^ ^6 Q (x Q )log* Q (x Q )+^ ^6 Q (.T Q )log6 Q (a; Q )+^(l-d i )^6 l (a; i )log6 J (a; i ). (2.38) 

aeF x a a£F x a ieV Xi 

In order to see the relation between the Bethe free energy function and the Gibbs free 
energy function we construct a map from the domain of the Bethe free energy as follows: 

n({M^)A(^)WF,* e v) := nM*")]! 6 ^) 1 "* ( 2 - 39 ) 

a i 

The following fact gives an insight that the Bethe free energy function approximate the 
Gibbs free energy function. 

Proposition 2.3. If H is a tree, II is a bijective map from L to £(I). The inverse map 
is obtained by the marginals of p 6 £ (I). Under this map, the Gibbs free energy function 
coincide with the Bethe free energy function: F = Ferns ° n. 

Proof. Since H is a tree, one easily observes that ^ T\ b a JX b^~ di = 1. This implies 
14(b) £ £ (I) (b = {b a (x a ),bi(xi)}). The injectivity of II is obvious because the marginals 
of 14(b) are {b a (x a ), 6j(x,)}. For given p G £(T), let p = {paiPi} be the set of marginal 
distributions. We see that II (p) = p because the expectation parameters {rj^iVi} °f the 
global exponential family are equal. Thus the first part of the assertion is proved. 

Next, we check that F = Fq^ s o II. Since the marginals of 11(6) are {b a (x a ), 6j(xj)}, 
we obtain 

F Glbbs oU(b) = ~J2 f n(b)log* a (x a )dv(x) + [ U(b)logb a (x a )diy(x) 

+ £(1-*) f 14(6) log fc^d^) 
ieV J 

= F(b). 



□ 
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For general factor graphs, II(fo) is not necessarily normalized. This property is related 
to the approximation error of the partition function (See Lemma 16.11 for details). Note 
also that, for general factor graphs, marginal distributions of an element in £ (I) are not 
necessarily elements in £ a or £{. However, for multinomial and (fixed-mean) Gaussian 
inference families, it is the case even if H is not a tree. 

Though the Bethe free energy function F approximates the convex function Fq^s, it 
is not necessarily convex nor has unique minima. Though functions cp a and <pt are convex, 
the negative coefficients (1 — di) makes the function complex. In general, the convexity of 
F is broken as the nullity of the underlying hypergraph grows. The positive-definiteness of 
the Hessian of the Bethe free energy will be analyzed in Section 14.31 using the Bethe-zeta 
formula. 

Type 2 

The second type of Bethe free energy function is a function of natural parameters. 
Definition 6. Define an affine space of natural parameters by 

A(l, ¥) := {6 = {6 a , e t }\9 {a) = 6 {a) v a£F,^ 9 a:i = (1 - + £ 9 a:i v i G a}. 

The type 2 Bethe free energy function^ J 7 is a function on A(I, defined by 

^0) = ~J2 ^«(*«) " I^ 1 " d M0i)- (2.40) 

Note that J- itself does not depend on the given distribution ^ in contrast to F. Note 
also that L and A are subsets of the same set Y ~ 0, where the identification is given by 
the map \\ a A a x FJ i Aj. As we see in the next subsection, the values of F and T coincide 
at intersections of L and A. 

2.3.3 Characterizations of the LBP fixed points 

We present several characterization of LBP fixed points. As we will discuss in Section 
14. H this presentation gives intuitive understanding of the Bethe-zeta formula. For the 

7 In this thesis, we mean "Bethe free energy function" by the type 1 unless otherwise stated. 
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characterizations, we use a formal definition of beliefs. We will see that it is the same thing 
given in Subsection 12.2.31 after knowing the result of the Theorem 12.31 

Definition 7. For given inference family I and graphical model \& = {^q}, A set of beliefs 
{b a (x a ),bi(xi)} ae F,i<=v is a set of pseudomarginals that satisfies 

n nM**) 1- * «n ( 2 - 41 ) 

a i a 

Theorem 2.3. Let I be a inference family and \£ = {^ a } be a graphical model. The 
following sets are naturally identified each other. 

1. The set of fixed points of loopy belief propagation. 

2. The set of the beliefs. 

3. The set of intersections of L{Z) and A(I,^f). 
4- The set of stationary points of F over L(T). 

5. The set of stationary points of T over A(X, *$>). 
Furthermore, for an LBP fixed point {9 a ,9i}, the corresponding beliefs are given by 

bi{ Xi ) = exp((0i, <k{xi)) - MOi)), (2.42) 
b a (x a ) = exp((0 a , 4> a {x a )) - ip a (0 a ))- (2.43) 

Proof. 2 44> 3: Since a set of beliefs is a set of pseudomarginals, the beliefs is identified with 
{0 a ,6i} that satisfies the local consistency conditions and Eq. (|2.4ip . These conditions are 
obviously equivalent to the constraints of L{T) and A(I, \E r ) respectively. 
1 44> 2,3: For a given LBP fixed point messages, a belief is given by Eqs. (|2.25p and (|2.26p . 
By definition, it is easy to check that Eq. (|2.41|) holds. For the converse direction, we define 
the messages by the beliefs by 



m a ^i(xi) = exp((0j + 9 a -i - 9 a:i , fa)). 



(2.44) 
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From the constraints of L(T) and A(I, one observes that 
Y[ mp^i(xi) = exp({6i, (f>i(xi))) oc k(xi), 

^a(x a ) Yl Yi m l3^i( x i) = exp((0 (Q!> , <f) {a ) (x a )) + ^ {Q a :i,<t>i)) °C 

Therefore, the local consistency condition implies that 

Y[ m i3^i(xi) oc / ^aYl I! m^j(.Xj)dv a ^i. (2.45) 

This is obviously equivalent to the LBP fixed point equation. 

3 44> 4: A point in is identified with {r^, f?i}»e-F,ieV- Taking derivatives of F, we see 

that stationary point conditions are 8^ a) = 9^ a) and ^2 aBi a -.i = ~^2aBi + (1 — di)6i 

3 4=> 5: This equivalence is also checked by taking derivatives of T in A(£, ^). □ 



The condition 3 is an alternative exposition of the characterization of the LBP fixed 
points given by Ikeda et al [63]. In the paper, the fixed points are characterized by "e- 
condition" and "m-condition," which partly correspond to the constraint of A and L re- 
spectively. Wainwright et al [123] derives another characterization of the LBP fixed points 
utilizing spanning trees of the graph. Their redundant representation of exponential families 
is related to our natural parameters of exponential families in the inference family. 

Finally, we check that the values of F and F are equal. Actually, using the third 
characterization, 

F(r,)-T(0) = -J2(9 ^2(<P*(ri a ) + M^)) + 5^(1 - di)[tpi[rn) + ^(0*)) 

a£F a£F ieV 

= 0. 



Definition 8. For given LBP fixed point, the Bethe approximation Zb of the partition 
function Z is defined by 

-logZ B = F(0)=T(r 1 ). (2.46) 
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2.3.4 Additional remarks 
Extensions and variants of LBP 

Generalizing the Bethe free energy function, the approximation method can be further ex- 
tended to Cluster Variational Method (CVM) [72], which leads a message passing algorithm 
called generalized belief propagation [1351 1136] . Another generalization of the Bethe free 
energy function is proposed in |133| . The derived message passing algorithm is called frac- 
tional belief propagation. Expectation propagation, introduced in [S3], is derived by easing 
the local consistency condition to consistency of expectations called weak consistency [56j. 
All these message passing algorithm include the LBP algorithm as a special case. 

The fourth condition in Theorem 12.31 says that the LBP algorithm finds a stationary 
point of the Bethe free energy function F. This viewpoint motivates direct optimization 
approaches to the Bethe free energy function. Welling and Teh [130J have derived an 
iterative algorithm that decrease the Bethe free energy function at each step. Yuille (137] 
also developed CCCP algorithm for the optimization. One advantage of these algorithm is 
that they are guaranteed to converge to an LBP fixed point. 

Related algorithms 

Max-product algorithm is a similar algorithm to LBP algorithm. It is obtained by replacing 
the sum operator in the LBP update Eq. (|2.24p with the max operator. From arithmetic 
laws satisfied by max and product, the max-product algorithm is defined parallel to LBP 
[3J. It is also obtained as a "zero temperature limit" of LBP algorithm. 



Part I 



Graph zeta in Bethe free energy 
and loopy belief propagation 
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Chapter 3 

Graph zeta function 



3.1 Introduction 

Zeta functions, such as Riemann, Weil and Selberg types, appear in many fields of mathe- 
matics. In 1966, Y. Ihara introduced an analogue of Selberg zeta function and proved its 
rationality establishing a determinant formula [60]. Though his zeta function was associ- 
ated to a certain algebraic object, it was abstracted and extended to be defined on arbitrary 
finite graphs by works of J. P. Serre [103], Sunada [115] and Bass [IT]. This zeta function is 
referred to as the Ihara zeta function. There are some generalization of the Ihara zeta func- 
tion. The edge zeta function is a multi-variable generalization of the Ihara zeta function, 
allowing arbitrary scalar weight for each directed edge [110] . L-function is also an extension 
using a finite dimensional unitary representation of the fundamental group of the graph. 
Another direction of an extension is the zeta function of hypergraphs [112] , 

In this chapter, unifying these generalizations, we introduce a graph zeta function de- 
fined on hypergraphs with matrix weights. We show an Ihara-Bass type determinant for- 
mula based on a simple determinant operations (Proposition IA.2"]) . This formula plays an 
important role in establishing the relations between this zeta and LBP in the next chapter. 

The remainder of this chapter is organized as follows. In Section 13.21 we provide the 
definition of our graph zeta function as well as necessary definitions of hypergraphs such as 
prime cycles. In Section 13. 3| we show the Ihara-Bass type determinant formula, requiring 
additional structure on the matrix weights. Miscellaneous properties of one-variable hyper- 
graph zeta is discussed in Section 13.41 We conclude in Section 13.51 with a summary and 
discussion of the role of these results in the remainder of the thesis. 
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3.2 Basics of the graph zeta function 
3.2.1 Definition of the graph zeta function 

In the first part of this subsection, in addition to Subsection l2.1.1l we introduce basic defi- 
nitions and notations of hypergraphs required for the definition of our graph zeta function. 

Let H = (V,F) be a hypergraph. As commented in Subsection 12.1.11 it is also denoted 
by H = (V U F,E). For each edge e = (a — > i) G E, s(e) = a G F is the starting factor 
of e and t(e) = i G V is the terminus vertex of e. If two edges e,e' £ E satisfy conditions 
t(e) G s(e') and t(e) ^ t(e'), this pair is denoted by e — ^ e' . (See Figure I3TT1 ) A sequence 
of edges (ei, . . . , e&) is said to be a closed geodesic if e\ — v e; + i for Z G TLjkTL. For a closed 
geodesic c, we may form the m-multiple c m by repeating c m-times. If c is not a multiple 
of strictly shorter closed geodesic, c is said to be prime. Two closed geodesies are said to 
be equivalent if one is obtained by cyclic permutation of the other. An equivalence class of 
a prime closed geodesic is called a prime cycle. The set of prime cycles of H is denoted by 

If H is a graph (i.e. d a = 2 for all a G F), these definitions reduce to standard 
definitions [76j . (We will explicitly give them in Subsection 13.3.21 ) In this case, a factor 
a = {i,j} is identified with an undirected edge ij, and (a — > j), (a — > i) are identified with 
[i — > j), (j — > i) respectively. 

Usually, in graph theory, Ihara's graph zeta function is a univariate function and associ- 
ated with a graph. Our graph zeta, which is needed for the subsequent development of this 
thesis, is much more complicated. It is defined on a hypergraph having weights of matrices. 
To define matrix weights, we have to prescribe its sizes; we associate a positive integer r e 
with each edge e G E. Note that the set of functions on E that take values on C re for each 
e G E is denoted by %{E). Note also that the set of n\ x n<i complex matrices is denoted 
by M(ni,n 2 ). 
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O 



ei 



Figure 3.2: C3 and its prime cycles. 



Definition 9. For each pair of e' — e, a matrix weight ii e '^e G M(r e ,r e /) is associated. 
For this given matrix weights u = {u e /^ e }, the graph zeta function of H is defined by 

where 7r(p) := u efc ^ ei . . . u e2 ^ e3 u ei ^ e2 for p = (ei, . . . ,e k ). 



Since det(/ n — AB) = det(/ m — BA) for n x m and m x n matrices ^4 and B, det(J — 
7r(p)) is well defined for the equivalence class p. Rigorously speaking, we have to care 
about the convergence; we should restrict the definition for sufficiently small matrix weights 
u. However, as we will discuss in the next subsection, the zeta function have analytical 
continuation to the whole space of matrix weights. 

If If is a graph and r e = 1 for all e S E, this zeta function reduces to the edge zeta 
function [llOj . Furthermore, if all these scalar weights are set to be equal, i.e. u e '^ e = u, the 
zeta function reduces to the Ihara zeta function. On the other hand, for general hypergraphs, 
we obtain the one-variable hypergraph zeta function by setting all matrix weights to be the 
same scalar u |112j . These reductions will be discussed in Subsection 13.3.21 

Example 6. Ch( u ) = 1 if and only if if is a tree. (See Proposition 13.21 is Subsec- 
tion [23J3) For 1-cycle graph Cjy of length N, the prime cycles are (ei, e2, . . . , ejy) and 
(ejv, &N—ii ■ ■ ■ , ei). (See Figure I3T21 ) The zeta function is 

Except for the above two types of hypergraphs, the number of prime cycles is infinite. 
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3.2.2 The first determinant formula 

The following determinant formula gives analytical continuation to the whole strength of 
matrix weights. 

Theorem 3.1 (The first determinant formula of zeta function). We define a linear operator 
M(u) : %{E) -> X(E) by 

M{u)f{e)= Y, <V-e/(e') feX(E). (3.2) 

e':e'^e 

Then, the following formula holds 

(h(u)- 1 = det(I - M(u)). (3.3) 

Note that matrix representation of the operator A4(u) is 

{Upi^ P if e' — 1 e 
(3.4) 
otherwise. 

The simplification of this matrix (i.e. on a graph, r e = 1, u = 1) is called directed edge 
matrix in |11(J| or Perron- Frobenius operator in [76] . A noteworthy difference, in our and 
their definitions, is that directions of edges are opposite, because we choose directions to be 
consistent with illustrations of the LBP algorithm. 

The following proof proceeds in an analogous manner with Theorem 3 in |110j . It is 
also possible to use Amitsur's theorem [5] as in [10j . 



Proof. First define a differential operator 

d 

n: =Z~2 Yl K'^ e )a e ,a e , ^ s (3.5) 

where (u e '^ e ) ae ,a e , denotes the (a e ,a e i) element of the matrix u e /^ e . If we apply this 
operator to a k product of u terms, it is multiplied by k. Since log Ctf(O) = and log det(I — 
A^O)) -1 = 0, it is enough to prove that ^logO?^) = %logdet(I — A^it)) -1 . Using 
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equations logdetX = trlogX and — log(l — x) = Ylk>i \ xk i we nave 
Hlog( H (u)=H -logdet(/-vr(p)) 



n E E r tr WP)*) ( 3 - 6 ) 



k 

peqJi? fc>l 

= E Z>imt(p)*) ( 3 - 7 ) 

pe'Pff fc>i 

£ tr(vr(C)) = ^>(.M(u) fc ). 

Cxlosed geodesic fc>l 

From Eq. (|3.6[) to Eq. (|3.7p . notice that % acts as a multiplication of k\p\ for each summand. 
This is because the summand is a sum of degree k\p\ terms counting each (ii e >->.e)a e ,a / 
degree one. From Eq. (|3.7p to the next equation, we used a property of closed geodesic: it 
is uniquely represented as a repeat of the minimal period. 
On the other hand, one easily observes that 



^logdet^-TW^))- 1 = H^~tr(M(u 



k>l 



k>l 

Then, the proof is completed. □ 



3.3 Determinant formula of Ihara-Bass type 

In the previous section, we showed that the zeta function is expressed as a determinant of 
a size ^2 e£ g f e matrix. In this section, we show another determinant expression, requiring 
an additional assumptions on matrix weights. The formula is called Ihara-Bass type deter- 
minant formula and indispensably used in the derivation of the Bethe-zeta formula in the 
next chapter. 

3.3.1 The formula 

In the rest of this subsection, we fix a set of positive integers {r{\i^v associated with vertices. 
Let {uf_>j} a £F t i t j£ a be a set of matrices of size uf_^, G M(rj, n,). Our additional assumption 
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Figure 3.3: Illustration for the definition of l{u). 



on the set of matrix weights, which is an argument of zeta function, is that 



:= r t(e) and u e >_ e := ujj 



(<0 

(e'Ht(e)" 



(3i 



Then the graph zeta function can be seen as a function of u = {uf_^j}. With slight abuse of 
notation, it is also denoted by (h(u). Later in Chapter U we see that n corresponds to the 
dimension of the sufficient statistic <pi and uf_^j comes to a matrix Var^. [<f>j]~ 1 Covb a [4>j> 

To state the Ihara-Bass type determinant formula, we introduce a linear operator i(u) : 
X(E) -> X(E) defined by 



(i(u)/)(e):= £ < ( ( e e ^ t(e) /(e0 / G X(E). (3.J 

s(e')=s(e) 
C ■ t( e ')#t(e) 



The matrix representation of is a block diagonal matrix because it acts for each factor 
separately. Therefore I + l{u) is also a block diagonal matrix. Each block is indexed by 
a £ F and denoted by U a . Thus, for a = . . . , id a }, 



"12 -Hi 



We also define w^.,- by the elements of W a = U a 1 . 



(3.10) 



(/:, 



w]' 



(3.11) 
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Similar to the definition of X{E) in Subsection l3.2.1l we define X(V) as a set of functions 
on V that takes value on C n for each i E V. 

Theorem 3.2 (Determinant formula of Ihara-Bass type). Let T> are W are linear trans- 
forms on XiV) defined by 

(Vg)(i) := dig®, (Wg)(i) := £ wp } ^g(t(e')). (3.12) 

t(e) — ^,s(e)—s(e , ) 

Then, we have the following formula 

ChH" 1 = det (I rv - V + W) J[ det U a , (3.13) 

a<=F 

where r v := ^2iev r i 

The rest of this subsection is devoted to the proof of this formula. The proof is based 
on the decomposition in the following Lemma 13.11 and the formula of Proposition IA.21 We 
define a linear operator by 

T:X(V)^X(E), (Tg)(e) := g(t(e)) 

The vector spaces %{E) and 3C(V) have inner products naturally. We can think of the 
adjoint of T which is given by 

T : X(E) -> X(V), (T*f)(i):= £ /(e). 

e:t(e)=i 

The linear operators have a following relation. 



Lemma 3.1. 



M(u) = l(u)TT* - l{u) 



(3.14) 
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Proof. Let / £ X(V). 



, S (e')=s(e) e":t(e")=t(e') „. s(e")=«(e) 

e • t(e')^t(e) C • t(e")^t(e) 

,. S (e') = s(e) „.t(e'') = t(e') 
' t(e')/t(e) ' e'Ve' 

(A4(«)/)(e). 



□ 



Proof of Theorem \3.B 



Ch{u)- 1 = det{I -M{u)) 

= det(J - i(u)TT* + t(u)) 

= det(/- t(u)rr*(/ + t(it))~ 1 )det((/ + t(ii))) 

= det(/ rv -T*(I + i(v))-h(u)T) JJ det(^ a ) 

It is easy to see that I rv - T*(I + t(w))- 1 t( u )7' = I rv - T*T + T*(I + t(«))~ 1 T. We 
observe that 

(T*Tg)(i)= 9(t(e)) = <kg(i) 

e:t(e)=i 

and 

(T*(/ + t^))- 1 ^)^) = E ((/ + ,( U ))- 1 r 5 )(e) = (Wg)(i). 

e:t(e)=i 

□ 



3.3.2 Special cases of Ihara-Bass type determinant formula 

In this subsection, we rewrite the above formula for two special cases. The first case is 
the Storm's hypergraph zeta function |112j . where all matrix weights are set to be the 
same scalar value u. In the second case, the zeta function is associated to a graph, not a 
hypergraph. This case corresponds to the pairwise inference family when we discuss the 
relations to the LBP algorithm in the next chapter. In both of the cases, the matrix W a , 
which was defined in Eq. (|3.1ip as the inverse of U a , has explicit expressions. 
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One variable hypergraph zeta 

Let Ti = 1 and uf^ = u. The set of functions of E and V are denoted by 3(E) and ^(V) 
instead of X(E) and X{V). We define the directed matrix by M. = i.e., 

fl ife'^e 
M e>e , = < (3.15) 
I otherwise. 

Then, At (it) = U.M. Theorem 13.21 is reduced to the following form. 

Corollary 3.1. 

^(n)" 1 = det ((1 - u)I + u 2 b{u) - uA{u)^ (1 - U )I^H^H*1 JJ (l + (d Q - 1)«) , 
where 

Proof. (U a )i t i = 1 and (f/ a )i,j = u implies that det U a = (1 — u) da ~ l (\ + (d a — 1)) and 
{W a ) iti = (l + {d a -2)u)(l-uy 1 (l + {d a -l)u) and {W a ) h] = -u(l - u)- 1 (I + (d a - l)u) 
Therefore, 



{I _ V + W)g{ i )=gi i ) _ d . g{j)+ £ -» ^ (i) + E 



l+(d a -2)« 



, (l-u)(l+(da-l)u) ^ (l-«)(l+(do-l)«) 



2 



(/-— ^( U ) + T — 2?(„)) 5 (j). 



□ 



Corollary 13. II extends Theorem 16 of [112], where this type of formula is only discussed 
for (d, r)-regular hypergraphs. 

Non-hyper graph zeta 

Here and in the below, we consider the case that H = (V, F) is a graph, i.e. all degrees of 
hyperedges are equal to two. Then it is identified with an (undirected) graph. 
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First, we define the zeta function Zq of a general graph G = (V, E). For each undirected 
edge of G, we make a pair of oppositely directed edges, which form a set of directed edges 



undirected edge by [e] = [e] E E. 

A closed geodesic in G is a sequence (ei,...,efc) of directed edges such that i(ej) = 
o(ej+i),ej 7^ ej+i for i E Z/fcZ. Prime cycles are defined in a similar manner to that of 
hypergraphs. The set of prime cycles is denoted by tya. 

Definition 10. Let G = (V, E) a graph. For given positive integers {rj}j g y and matrix 
weights u = {u e } e£ g of sizes u e E M(r t(e) , r o(e) ), 



Z G (u) := JJ det(l-7r(p)) \ vr(p) := u ei ■ ■ ■ u ek for p = (ei, . . . , e k ), (3.16) 



This zeta function is the matrix weight extension of the edge zeta function in |110| where 
the edge weights are scalar values. Since ^Pgh is naturally identified with *p#, Zq h = Qh- 

Corollary 3.2. For a graph G = (V,E), 



E. Thus \E\ = 2\E\. For each directed edge e E E, o(e) E V is the origin of e and t(e) E V 
is the terminus of e. For e E E, the inverse edge is denoted by e, and the corresponding 



pe'Pc 



Zg(m)- 1 =det(/ + X>(ii) -i(u)) Yl det ( J 

[e]e£ 



(3.17) 



where T> and A are defined by 



(T>(u)g)(i) 



^ {I ri -U e Ue) l U e U s )g(i) 



(3.18) 



e:t(e)=i 




(3.19) 



e:t(e)=i 



Proof. For e = (i — > j), the C/r e i block is given by 





(3.20) 
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Therefore det Ut e ] = det(J ri — u e Ue) and the inverse Wt e ] is 



(I r . - U e Ue) 1 

(I r . - UgUe)' 1 



Tl P _ I r 



Plugging these equations into Theorem 13.21 we obtain the assertion. 



(3.21) 



□ 



In |85j and [58] . a weighted graph version of Ihara-Bass type determinant formula is 
derived assuming scalar weights {u e } ef -g satisfy conditions of u e u s = u 2 . In this case, the 
factors (1 — UeUe)' 1 in Eqs. (|3.18p and ()3.19f) do not depend on e and Eq. (|3.17p is simplified. 
Corollary 13.21 gives the extension of the result to arbitrary weighted graph. A direct proof, 
without discussing hypergraph case, of Corollary 13.21 is found in Theorem 2 of |126| . 



Ihara-Bass formula 

Reduced from these two special cases, we obtain the following formula which is known as 
Ihara-Bass formula: 

Zdu)- 1 = (1 - u 2 )\ E \-\ y \ det (J -uA + u 2 (V - /)), 
where T> is the degree matrix and A is the adjacency matrix defined by 

(Vf)(i):=dif(i), (Af)(i):= J2 ^°( e ))> / eC W 

eeE,t(e)=i 

Many authors have been discussed the proof of the Ihara-Bass formula. The first proof was 
given by Bass [11] and others are found in [76} IllOj . A combinatorial proof by Foata and 
Zeilberger is found in [37] . 



3.4 Miscellaneous properties 

This section provides miscellaneous topics. In the first subsection, the prime cycles are 
discussed relating hypergraph properties. In the second subsection, we present additional 
properties of the directed edge matrix and the one- variable hypergraph zeta function. These 
properties are utilized in the subsequent developments. 
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3.4.1 Prime cycles 
Proposition 3.1. ty H = *Pcore(H) 

Proof. Let H = (V U F,E). The proof is by induction on \E\. If H is a coregraph, the 
statement is trivial; if not, there is a directed edge e £ E that satisfies d s r e -\ = 1 or d t ^ = 1. 
Obviously, there is no geodesic that goes through e. Therefore, removal of e from H does 
not affect the set of prime cycles. □ 

This Proposition immediately implies that Qh = Ccorc(H)- 

The following proposition claims that trees can be characterized in terms of prime cycles. 

Proposition 3.2. Let H be a connected hypergraph. H is a tree if and only if^in = 0- 

Proof. The "only if" part is trivial from Propositions 13.11 and 12.11 For "if" part, assume 
that H is not a tree. Then there is a cycle and the cycle gives a closed geodesic. Therefore, 
± 0- □ 

3.4.2 Directed edge matrix 

First, we derive a simple formula for the determinant of the directed edge matrix Ai. This 
type of expression appears in the loop series expansion of the perfect matching problem in 
Section I 



Theorem 3.3. 

detM = JJ(l-dj) JJ(l-d Q ) (3.22) 

Proof. From Lemma 13.11 we have Ai = lTT* — i, where t = t(l). From Eq. (|3.9p and 
Proposition IA.3[ we see that det(i) = ILei^- 1 )^ 1 ( d » ~ l )- Using Proposition \KM the 
assertion follows. □ 

This formula implies that the matrix is invertible if and only if the hypergraph has a 
nonempty coregraph. Since Cff( u ) = Ccore(_ff)( -u )i the spectrum (i.e. the set of eigenvalues) 
of H and coie(H) only differs by zero eigenvalues. 

Next, we consider the irreducibility of the non- negative matrix Ai. 

Proposition 3.3. For a connected hypergraph H, M. is irreducible if and only if H is a 
coregraph and n(H) > 2. 
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Proof. By definition, Ai is irreducible iff, for arbitrary e and e' E there is a sequence 
of directed edges (ei, e2, • • • , &k) s.t. ei = e, ej — e; + i (I = 1, . . . , k — 1) and e& = e'. If 
H is a connected coregraph with n(H) > 2, we can construct such a sequence if H is a 
connected coregraph and has more than one cycles. If not, we can not do that. (Detail is 
omitted.) □ 

Another important question regarding the directed matrix Ai is the spectral radius, or 
the Perron-Frobenius eigenvalue. 

Proposition 3.4. For e £ E, let k e := \{e' G E; e' — e}\, k m = min£; e and ku = max/c e . 
Then 

k m < p(M) < k M . (3.23) 

Therefore, if core(H) / 0, then p{M) > 1. If H is (a,b)-regular, p{M) = (a - l)(b— 1). If 
H is a graph, 

mind* - 1 < p(M) < maxdj - 1. (3.24) 

Proof. Since k e = ^2 e , M. e ,e'-, the bound Eq. f|3.23j) is trivial from Theorem IA. 21 The second 
statement comes from k m > 1 for non-empty coregraphs. □ 

Finally let us consider the pole of Cg(^)- Obviously the pole closest to the origin is 
u = p(A4)~ 1 > k~j^ and is simple if Ai is irreducible. Furthermore, the following theorem 
implies that Cg( u ) has a pole at it = 1 with multiplicity n(H) if H is connected and 
n(H) > 2. 

Theorem 3.4 (Hypergraph Hashimoto's theorem). Let x(-ff) := \V\ + \F\ — \E\ be the 
Euler number of H. 

lim Ch(«) _1 (1 " u)-*W +l = X (H) K (B H ), (3.25) 

u— >0 

where k(E>h) is the number of spanning trees of the bipartite graph P>h. (See Subsection 
\2.1.1\ for the construction of Bh from H.) 

Proof. For a graph G = (V,E), Hashimoto proved that \51\ I52j 

lim Z G {u)-\l - ^-l^l+l^l- 1 = -2\ E \-\ y \ +l (\E\ - \V\)k(G), 

U—>1 
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where n(G) is the number of spanning tree of G. A simple proof by Northshield is found in 
[93] . Since there is a one-to-one correspondence between and ^Pb h , we have (h(u) = 
Zb h (Vu). Then the assertion is proved by the above formula. □ 

In |112j . Storm showed that, if H is an (a, 6)-regular Ramanujan hypergraphs, all non 
trivial poles of Cg( u ) ne on the circle of radius [(a — 1)(6— l)] -1 / 2 . This property is analogous 
to the Riemann hypothesis (RH) of the Riemann zeta function, which claims that all non 
trivial zeros of the Riemann zeta function have real part of 1/2. For the Ihara zeta function, 
a bound on the modulus of imaginary poles is found in |76j . 

3.5 Discussion 

In this chapter, we introduced our graph zeta function, generalizing graph zeta functions 
known in graph theory. Our main contribution of this chapter is the Ihara-Bass type deter- 
minant formula, which extends the Ihara-Bass formula of one- variable graph zeta function. 
The proof is based on a simple determinant formula in Proposition IA.21 changing the size 
of determinant from dixnX(E) to dimX(F). 

The Ihara-Bass type determinant formula plays an important role in developments in 
the sequel, especially in the proof of Bethe-zeta formula in Chapter 21 The formula is 
also used in the alternative derivation of the loop series in the perfect matching problem, 
showing intimate relations between the graph zeta function and the Bethe approximation. 

The definition of our zeta function can be extended to Bartholdi type zeta function where 
closed geodesies are allowed to have backtracking [10]. The Ihara-Bass type determinant 
formula in Theorem 13.21 is also extended to this case without difficulty. A related work is 
found in [55] . 

In this thesis, we discuss the graph zeta function only due to the connection to LBP 
algorithm and Bethe free energy function. However, there are other contexts where Ihara 
zeta function appears. We refer a paper [116] for a review of the Ihara zeta function and 
related topics. 



Chapter 4 

Bethe-zeta formula 



4.1 Introduction 

The aim of this chapter is to show the "Bethe-zeta formula" and to demonstrate its ap- 
plications. This formula provides a relation between the Hessian of the Bethe free energy 
function and the graph zeta function. 

In Section H~2l we prove the main formula using the Ihara-Bass type determinant formula 
proved in the previous chapter. In Section 14.31 as an application of the main formula, we 
analyze the region where the Hessian of Bethe free energy function F is positive definite. 
Section [4T41 discusses the stability of the LBP fixed points, extending results of Heskes [5l] 
and Mooij et al [88] . The main formula is further applied to the uniqueness problem of the 
LBP fixed points in the next chapter. 

4.1.1 Intuition for the Bethe zeta formula 

Beforehand, we describe the underlying mathematical structure that let the Bethe-zeta 
formula hold. It is the "duality" between the two variational characterizations of the LBP 
fixed points given in Theorem 12.31 

Recall that the LBP fixed points are the intersections of the submanifold L(I) and 
A(I, \&). See Figure I4TT1 The whole space is Y — 0; an element of this set is a vector of all 
expectation/natural parameters of local exponential families. Each line stands for L and A 
respectively and the intersection is an LBP fixed point. In the first figure, the submainfolds 
intersect transversally, while those intersect tangentially in the second figure. In the second 
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case, both the Hessians of F and T degenerate. Therefore, one can expect that 

det(V 2 F) = <=^ The intersection is tangential <=^ det(V 2 J") = 0, (4.1) 

where V 2 denotes the Hessian with respect to the coordinates of L and A, respectively. 
After calculations (See Appendix IB.2p . one can see that 

V 2 T = X[I - M{u)}Y (4.2) 

holds at an LBP fixed point with certain matrices X and Y. 

These observations suggest that there is a relation like det(V 2 F) = det(I — M(u)) x 
(factor) at LBP fixed points. Furthermore, since A(I, moves depending on vl/, one can 
expect that such relations hold at all points of LiX). 

Based on the techniques developed in the previous chapter, we will formulate the rela- 
tions as an identity on L rather than statements on LBP fixed points, i.e., the Bethe-zeta 
formula. The first advantage of this approach is its powerful applicability. In fact, the 
Bethe-zeta formula will be utilized as a continuous function on L in the proof of Theorem 
14.21 The second advantage is the simplicity of the proof. This approach only involves linear 
algebraic calculations and is much easier than just making the above observations rigorous. 

4.2 Bethe-zeta formula 

In order to make the assertion clear, we first recall the definitions and notations. Let 
H = (V, F) be a hypergraph and let I = {£ a , £%} be an inference family on H. Exponential 
families £ i and £ Q have sufficient statistics (pi and <f> a as discussed in Subsection 12.2.21 
Furthermore, as discussed in Subsection I2.3.2| a point rj = {??(a>,??i} G L is identified with 
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a set of pseudo-marginals {b a (x a ),bi(xi)} a& F,iev- 

Theorem 4.1 (Bethe-zeta formula). At any point of rj = {ij^iVi} 6 L the following 
equality holds. 

Cff(ti) _1 = det(V 2 F) J] det(Var 6a [<f> a ]) det(Var b .[0i]) 1- *, 

w/iere 



:= Vax ft . [^r'Cov^ [^-, 4 ] (4.3) 



is an rj x t{ matrix. 



Note that S7 2 F is the Hessian matrix with respect to the coordinate {j](a)if]i\- The 
Hessian does not depend on the given compatibility functions ^ a because those only affect 
linear terms in F. So, the formula is accompanied with the inference family I. 

By the definition of the inference family, all local exponential families £ a and £{ satisfy 
Assumption [TJ Therefore, the determinants of variances appear in the formula are always 
positive. 

Note that the zeta function is given by the products of weights Eq. (|4,3p along prime 
cycles. This type of expression also appears in the covariance expression of distant vertices 
on a tree structured hypergraph. (See Appendix lB.il ) 



Proof of Theorem \4-l\ From the definition of the Bethe free energy function Eq. (|2.36p , the 
(V,V)- block of V 2 F is given by 

d 2 F y ay | ay ^ o 2 f y ay a 

drjidrji ^ dr]idr]i 1 drudrji ' drjidrjj a ^u^ diJidVj 

The (V,F)-block and (F,F)-block are given by 

d 2 F d 2 ip a d 2 F d 2 ip a 



Using the diagonal blocks of (F,F)-block, we erase (V,F)-block and (F,V)-block of the 
Hessian. In other words, we choose a square matrix X such that det X = 1 and 



X T (V 2 F)X 



Y 




d 2 F \ 



(4.4) 



4.2. BETHE-ZETA FORMULA 71 



Then we obtain 



d 2 ip a d 2 ip a ( d 2 ip a \ 1 d 2 ip a I . . d 2 ipi 



Y ..= y ( d2(fia / gVa \ 1 o 2 ip a \ 

M ^.1%^ drndr) {a} \dr) {a) dri {a} ) dri^drjj \ ' 

On the other hand, since uf_^j := Var^[0j] _1 Cov;, Q [(/>.;, </>j], the matrix U a defined in 
Eq. (1X1(1 is 

U a = diag^arf^]" 1 ^ G a) Var fea [(&) i6a ]- ( 4 -7) 



Since the matrix Var^ [(0i)iea] is a submatrix of Varb Q [(/> Q ], its inverse can be expressed 

l 1-1 _ d 2 ip a 



by submatrices of Var^ [<£ a ] 1 = g g g° using Proposition IA.11 Therefore, the elements of 



d 2 ip a d 2 tp a ( d' 2 ip a \ 1 d 2 ip 



W a = U a is given by 

J [oriidrij dr]idri {a) \dri (a) dr] {a) J drj^drjj J 
Combining Eq. (|4.5p . (|4.6p and (|4.8p . we obtain 

T diag (Varolii G F) = I - X> + W, (4.9) 
where P and W are defined in Eq. (|3.12p . Accordingly, we obtain 
Ch(u)- 1 = det(/ - V + W) Y[ det £/ Q 

aS-F 

= dety TT det(Var[^]) TT ^^nt*)^ 
M ^ JJ iin, eQ det(Var[^]) 



1-4 TT det(Var 6a [(0 i ) i6a ]) 



a 2 



det (V 2 F) Yl det(Var[^]) 1 - dl J] 
det (V 2 F) J] det(Var 6a [^]) ]J detCVar^]) 1 -* 



where we used det (Var ba [(<^) iea ]) det ( aJ^fy ■ ) = det (Var[0 a ]), which is proved by 
Proposition I A. 11 □ 
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4.2.1 Case 1: Multinomial inference family 



In the rest of this section, we rewrite the Bethe-zeta formula for specific cases. Especially, 
we give explicit expressions of the determinants of the variances. First, we consider the 
multinomial case. 

Lemma 4.1. Let cj) be the sufficient statistics of the multinomial distributions on X = 
{1,2,..., TV} defined in Example [7J Then the determinant of the variance is given by 



N 



det (Vax p [<fl) = ~[[p(k). 



(4.10) 



k=l 

Proof. From the definition of the sufficient statistics, one easily observes that Var[ 



det(Var 



det 



<t>3 


] = - 


-p(i)p(j)- Therefore 


1 


I 


p(l) 







P(l) 


V 





P(N-1). 




jp{N - 1). 



p(l) 



p(N-l) 



N-l 



N-l 



k=i 



k=l 



□ 

Corollary 4.1 (Bethe-zeta formula for multinomial inference family). For any pseudo- 
marginals {b a (x a ),bi(xi)} £ L the following equality holds. 



Ch(u)- 1 = det(V 2 F) l\ Y[b a (x a ) J] Ylh 



where uf_^j := Var^. [<^-] _1 Covb a [4>j , <pj\ is an rj x r» matrix. 

For binary and pairwise case, this formula is first shown in [126J. 



4.2.2 Case 2: Fixed-mean Gaussian inference family 

Let G = (V, E) be a graph. We consider the fixed-mean Gaussian inference family on G. 
For a given vector = (//j)j 6 y, the inference family is constructed from sufficient statistics 



<pi(xi) = (xi - m) 2 and (p (ij) (xi, Xj) = (xi - m){xj - fij). (4.11) 
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The expectation parameters of them are denoted by rja and rjij, respectively. The variances 
and covariances are 



VarM = 277i Var[ 



2r lij -'li 2 VjjVij 



(4.12) 



where 



j(Xi,Xj) = ((Xi - [lif, (Xj - /Uj) 2 , (Xi - (ii)(xj - (J,j)) . 



Therefore, det(Var [4>ij]) = 4(r]nrjjj — r\, 



2 \3 



Corollary 4.2 (Bethe-zeta formula for fixed-mean Gaussian inference family). For any 

pseudomarginals {r]u,r]ij} G L the following equality holds. 



Z G {uY x = det(V 2 F)l[ V 



2(1-4) 



n taw 

ijeE 



^) 3 2m 



where u? A := rif-ii- 2 is a scalar value. 
1 v j j 



One interesting point of this case is that the edge weights Ui-+j are always positive. 



4.3 Application to positive definiteness conditions 

The Bethe free energy function F is not necessarily convex though it is an approximation 
of the Gibbs free energy function, which is convex. Non convexity of the Bethe free energy 
can lead to multiple fixed points. Pakzad et al [95J and Heskes [55] have derived sufficient 
conditions of the convexity and have shown that the Bethe free energy is convex for trees 
and graphs with one cycle. In this section, instead of such global structure, we shall focus 
on the local structure of the Bethe free energy function, i.e. the Hessian. 

As an application of the Bethe-zeta formula, we derive a condition for positive definite- 
ness of the Hessian of the Bethe free energy function. This condition is utilized to analyze 
a region where the Hessian is positive definite. 

We will use the following notations. For a given square matrix X, Spec(X) C C de- 
notes the set of eigenvalues (spectra) and p{X) the spectral radius of a matrix X, i.e., the 
maximum of the modulus of the eigenvalues. 
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4.3.1 Positive definiteness conditions 

Lemma 4.2. Let rj = {rj( a ),Vi} £ L. If Cov ba [fy, cj)j] = is for alia S F and i,j S a(i ^ j), 
then V 2 F(rj) is a positive definite matrix. 

Proof. We use the notations following Theorem 14.11 The assumption of this lemma means 
uf_^.j = 0. Since W a = XJ~ X = I, we have wf_+j = 5ij. Therefore, Y^j = Var[0j] - Sij 
and Y is a positive definite matrix. Furthermore, ^ - is a positive definite matrix 
because it is a submatrix of the positive definite matrix g^Jg° a = Var[0 a ] _1 . Therefore, 
from Eq. (|4.4p . V 2 F is positive definite. □ 

Theorem 4.2. Let I be a multinomial or fixed-mean Gaussian inference family. Let u be 
given by ij £ L using Eq. &4.3\ ). Then, 

Spec(M(u)) c C\R>i =>- V 2 F(r?) is a positive definite matrix. 

Proof. We give proofs for each case. 
Case 1: Multinomial 

The given rj is identified with a set of pseudomarginals {b a (x a ), bi(xi)}. We define rj(t)(t £ 
[0, 1]) by a set of pseudomarginals b a (t) := tb a + (1 — t) Y\ iea h and := 6j. Obviously, 
77(1) = ?7, and ^(0) has zero covariances. From Lemma 14.21 it is enough to prove that 
V 2 F(r/(t)) / on the interval [0, 1] because all eigenvalues of the Hessian are real numbers. 
The covariances and variances at t are 

CoY b a (t) [4>i,<fij] = tCov ba [4>i, 4>j], Var M<) [4>i] = Yai ba [&]. (4.13) 

Therefore, Ai(u(t)) = tA4(u). Our assumption of this lemma implies det(J — M(u(t))) 7^ 
on the interval. From Theorem EH we conclude that S/ 2 F(rj(t)) / 0. 
Case 2: Fixed-mean Gaussian 

The proof is analogous to the above proof. We define rj(t) € L by 

v(t)ii-=Va, V(t)ij '-tVij- (4-14) 

From Eq. (|4.12|) . we have 



Cov 6 a (t)[^i,^] =t 2 Cov ba [(l)i,(/) j }, Var M 4)[0;] =Vax & „[^]. (4.15) 
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Therefore Ai(u(t)) = t 2 A4(u). The remainder of the proof proceeds in the same manner. 

□ 

4.3.2 Region of positive definite 

In this section, we analyze conditions for the pseudomarginals that guarantees the positive 
definiteness of the Hessian. Our result says that if the correlation coefficient matrices are 
sufficiently small, then the Hessian is positive definite. This "smallness" criteria depends 
on graph geometry. 

First, we define correlation coefficient matrices. 

Definition 11. Let x,y be vector valued random variables following a probability distri- 
bution p. The correlation coefficient matrix of x and y is defined by 

Cov p [y,x] ^Var^r^CoVp^Varptx]- 1 / 2 . (4.16) 

Our approach for obtaining conditions for the positive definiteness is based on Theorem 
14.21 Thus we would like to bound the eigenvalues of M{u). The following lemma implies 
that the spectrum of A4(u) is determined by the correlation coefficient matrices. 

Lemma 4.3. Let uf_+j be given by Eq. |^.3| ) and cf_^j := Cor^ , <j)j\. Then 

Spec(.M(u)) = Spec(7W(c)). (4.17) 
Proof. Define Z by (Z) e , e ' ■= <5 e , e 'Var [4>t(e)} 1/2 ■ Then 

(ZM(u)Z~% e , = Var[^ (e) ] 1 /2^( u ) ee ,Var[ ( /> 4(e , ) ]- 1 / 2 = M(c) e , e> . (4.18) 

□ 

Next, we define the operator norm of matrices because we need to measure "smallness" 
of a correlation coefficient matrix. 

Definition 12. Let V\ and V2 be finite dimensional normed vector spaces and let X be a 
linear operator from V\ to V2. The operator norm of X is defined by 

IIXII := max \\Xx\\. (4.19) 
11*11=1 
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Since V\ is finite dimensional, the maximum exists. By definition, p{X) < \\X\\ and 
\\XY\\ < \\X\\ \\Y\\ holds. 

The operator norm depends on the choice of the norms of V\ and V2 ■ If the norms in V\ 
and V2 are given by its inner products, the induced operator norm is denoted by || • H2. Then 
\\X\\2 is equal to the maximum singular value of X. In this case, the norm of a correlation 
coefficient matrix is smaller than 1. (See Proposition IA.4I in Appendix IA.21 ) 



Lemma 4.4. Let 

k:= max ||Cor ba 0,] || (4.20) 

qS-F 
i,j£at 

and let a be the Perron- Frobenius eigenvalue of M. . Then 

p(M(u)) < net. (4.21) 



Proof. Prom Lemma 14.31 we consider the spectral radius of Ai(c). It is enough to prove 
det(I — zA4(c)) does not have any root in {A G C| |A| < Accordingly, we show 

that Qh{zc) does not have any pole in the set. If H is a tree, the statement is trivial. 
Thus, from Proposition 13.41 we assume a > 1 in the following. Let p be a prime cycle and 
let Ai, . . . , A r be the eigenvalues of 7r(p;c). Then we obtain maxA; < k' p ' because of the 
properties of operator norms. From this inequality, if \zk\ < a -1 < 1, we obtain 



det(/-zl p l7r(p;c)) 
Therefore, if \z\ < ft _1 a _1 , 



> (1 - \ZK 



\p\y 



\<h(zc) 



Y[det{I - z^7r{p;c))- 1 



<np 

peP 



\ZK\ 



Ch(\zk\Y < OO. 



(4.22) 



□ 



The following theorem gives an explicit condition of the region where the Hessian is 
positive definite in terms of the correlation coefficient matrices of the pseudomarginals. 

Theorem 4.3. Let I be a multinomial or a fixed-mean Gaussian inference family. Let a 
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be the Perron- Frobenius eigenvalue of Ai and define 



L a -i(I) := {{b a (x a ),bi(xi)} G L(Z) \ Ma G F, G a, \\Cor ba [fc, <f>j] \\ < a 1 ] . 



Then, the Hessian V 2 F is positive definite on L a ~i(I). 

Proof. Obviously, k < a -1 holds in L a -i. From Lemma [HH Spec(7W(it)) C {A G C| |A| < 



Properties of the Perron- Frobenius eigenvalue of Ai, such as bound, is discussed in 
Subsection 13.4.21 Roughly speaking, as degrees of factors and vertices increase, the a also 
increases and thus L a -i shrinks. 

The region L a -i depends on the choice of the operator norms. If we chose the norm 
|| • H2, this theorem immediately implies the convexity of the Bethe free energy function of 
tree and one-cycle hypergraphs as we will discuss in the next subsection. 

4.3.3 Convexity condition 

Since the Hessian V 2 F does not depend on the given compatibility functions ^ = {^a}, 
the convexity of F solely depends on the given inference family and the factor graph. For 
multinomial case, Pakzad and Anantharam have shown that the Bethe free energy function 
is convex if the factor graph has at most one cycle [95]. The following theorem extends the 
result. 

Theorem 4.4. Let I be a multinomial or fixed-mean Gaussian inference family associated 
with a connected factor graph H. Then 



Proof. (<^=) Here, we give a proof based on Theorem l4.3l which assumes the inference family 
is multinomial or fixed-mean Gaussian. However, this direction of the statement is valid for 
any inference family. (See appendix IB. 31 ) 

From Proposition 13.41 the Perron- Frobenius eigenvalue a is equal to 1 if n(H) = and 
if n{H) = 1. Using Theorem 14.31 with norm || • H2 and Proposition I A. 4\ we obtain L a -i = L. 
Therefore, the Bethe free energy function is convex over the domain L. 
(=^) We prove for each inference family. 



1}. From Theorem 14. 2\ the Hessian is positive definite. 



□ 




(4.23) 
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Fixed- mean Gaussian case 

Let G = (V,E) be a graph. For t £ [0,1], let us define r]u(t) := 1 and r]ij(t) := t. 
Accordingly, u^ - = t 2 and rj(t) G L. As t /* 1, t](t) approaches to a boundary point of L. 
From Theorem 13.41 

det(V 2 F(i))(l - 42)31^1+1^1-1 = 2 -\ y \z G (t 2 )- l (l - t 2 )-\ E \+\ v \- 1 

— >• -2i s i- 2 i F i +1 (|£;| - |y|)K(G) (t -> i). 

If n(G) = \E\ — \V\ + 1 > 1, the limit value is negative. Therefore, at least in a neighborhood 
of the limit point, V 2 F is not positive definite. 
Multinomial case 

First, we consider binary case, i.e. 4>i(xi) = xi £ {±1}- For t £ [0,1], let us define 
rjij(t) := t and rji(t) := 0. Accordingly, uf_^- = t and rj(t) G L. As t 1, ^(i) approaches 
to a boundary point of L. Using Theorem 13.41 analogous to the fixed-mean Gaussian case, 
we see that det(V 2 F(t)) becomes negative as t — > 1 if n{H) = 1 — x(H) > 1. Therefore, F 
is not convex on L. 

For general multinomial inference families, the non convexity of F is deduced from the 
binary case. There is a face of (the closure of) LiX) that is identified with the set of 
pseudomarginals of the binary inference family on the same factor graph. From Eq. (|2.38p 
and log = 0, we see that the restriction of F on the face is the Bethe free energy function 
of the binary inference family. Since this restriction is not convex, F is not convex. □ 



4.4 Stability of LBP 

In this section we discuss the local stability of LBP and the local structure of the Bethe 
free energy around an LBP fixed point. In the celebrated paper |135| . which introduced 
the equivalence between LBP and the Bethe approximation, Yedidia et al empirically found 
that locally stable LBP fixed points are local minima of the Bethe free energy function. 
Heskes have shown that a locally stable fixed point of arbitrary damped LBP is a local 
minima of the Bethe free energy function for the multinomial models [54] . In this section, 
we extend the property to the fixed-mean Gaussian cases, applying our spectral conditions 
of stability and positive definiteness. Since the converse of the property is not necessarily 
true in general, we also elucidate the gap. 
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First, we regard the LBP update as a dynamical system. As discussed in Section [2.2.51 
the (parallel) LBP algorithm can be formulated as repeated applications of a map T on the 
natural parameters of the messages. Explicit expression of this map is given in Eq. (|2.29p . 
The differentiation T' at the fixed point, which is computed in Theorem 12. 2i is rewritten as 
follows. 

Proposition 4.1. At an LBP fixed point r\ £ L, 

T' = M{u), (4.24) 
where u = {uf_+j} is given by Eq. ( f^.3[ ). 
4.4.1 Spectral conditions 

Let T be the LBP update map. A fixed point /x* is called locally stabl^ if LBP starting 
with a point sufficiently close to /x* converges to /x*. To suppress oscillatory behaviors of 
LBP, damping of update T e := (1 — e)T + el is sometimes useful, where < e < 1 is a 
damping strength and / is the identity matrix. 

As we will summarize in the following theorem, the local stability is determined by the 
linearization T" at the fixed point. Since T' is nothing but Ai(u) at an LBP fixed point, the 
Bethe-zeta formula naturally derives relations between the local stability and the Hessian 
of the Bethe free energy function. 

Theorem 4.5. Let I be a multinomial or a fixed-mean Gaussian model. Let /x* be an LBP 
fixed point and assume that T"(/x*) has no eigenvalues of unit modulus for simplicity. Then 
the following statements hold. 

1. Spec(T'(/Lt*)) C {A £ C||A| < 1} LBP is locally stable at /x*. 

2. Spec(T'(/x*)) C {A £ C|ReA < 1} <^=^ LBP is locally stable at /x* with some damping. 

3. Spec(T'(^*)) cC \ E>i =4> /x* is a local minimum of BFE. 

Proof. 1. : This is a standard result. (See [39] for example.) 2. : There is an e £ [0, 1) that 
satisfy Spec(T (E , (/x*)) C {A £ C||A| < 1} if and only if Spec(T'(/x*)) C {A £ C|ReA < 1}. 
3. : This assertion is a direct consequence of Theorem 14.21 and Proposition 14.11 □ 

1 This property is often referred to as asymptotically stable 49 . 
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This theorem immediately resolve the conjecture of Yedidia et al |135j : locally stable 
LBP fixed points are local minima of the Bethe free energy. Since they only discusses multi- 
nomial models, their experiments seems to have been performed for the models. Extending 
the statement, the theorem implies that, for both multinomial and fixed- mean Gaussian 
cases, locally stable fixed points of arbitrary damped LBP are local minima of the Bethe 
free energy function. 

Heskes [33] have proved that locally stable fixed points are "local minima of the Bethe 
free energy," where the Bethe free energy function is defined on a restricted set. We give 
the following remarks to see that their result actually resolves the Yedidia's conjecture. He 
considers the Bethe free energy function on a subset of L; 



where 9/ a \ is a constant given by the compatibility function from Eq. (|2.2ip . In other words, 
S is a set of pseudomarginals {b a (x a ), bi(xi)} that satisfies 



Obviously, all LBP fixed points are in S. We can take a coordinate {r^jigy of S because 
r) a is a function of {r]i}i£ a . The restriction of the Bethe free energy function F to this set 
is denoted by F. It is straightforward to check that the stationary points of F correspond 
to the LBP fixed points, that is, 



In [33], for multinomial models, Heskes have shown that if {rji} 6 S is a locally stable fixed 
point of (arbitrary damped) LBP, it is a local minimum of F. This statement is equivalent 
to the statement replaced by "local minima of F." In fact, we can easily check that the 
positive definiteness of the Hessian of F is equivalent to that of F. 

It is interesting to ask under which condition a local minimum of the Bethe free energy 
function is a locally stable fixed point of (damped) LBP. An implicit reason for the empirical 
success of the LBP algorithm is that LBP finds a "good" local minimum rather than a local 
minimum nearby the initial point. The theorem gives a partial answer to the question, 




V 




dF({rii}) 



j £ V 



{rji} G S is an LBP fixed point. 
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i.e., the difference between stable local minima and unstable local minima, in terms of the 
spectrum of T"(/z*). It is noteworthy that we do not know whether the converse of the third 
statement holds. 

4.4.2 Pairwise binary case 

Here we focus on binary pairwise attractive models. In this special case, the stable fixed 
points of LBP and the local minima of Bethe free energy function are less different. 

The given graphical model \& = {^y,^} is called attractive if Jij > 0, where ^i(t) = 
exp(ZijXj) and \Py(arj, Xj) = exp(JijXiXj) (x{,Xj G {±1}). The following theorem implies 
that if a stable fixed point becomes unstable by changing and hi, the corresponding 
local minimum also disappears. 

Theorem 4.6. Let us consider continuously parameterized attractive models {^^(t), ^(t)}, 
e.g. t is a temperature: ^ij(t) = exp(i _1 JijXiXj) and ^i(t) = exp(t~ 1 hiXi). For given t, 
run LBP algorithm and find a (stable) fixed point. If we continuously change t and see the 
LBP fixed point becomes unstable across t = to, then the corresponding local minimum of 
the Bethe free energy becomes a saddle point across t = to. 

Proof. Prom Eq. (|2,26p . we see that bij(x{,Xj) oc exp( J^XiXj + 6iXi + OjXj) for some 9i and 
6j. From Jjj > 0, we have Cov&y [xi, Xj] > 0, and thus > 0. When the LBP fixed 

point becomes unstable, the Perron- Frobenius eigenvalue of M{u) goes over 1, which means 
det(I — M.(u)) crosses 0. From Theorem 14.11 we see that det(V 2 F) becomes positive to 
negative at t = to. □ 

Theorem 14.61 extends Theorem 2 of [88] , which discusses only the case of vanishing local 
fields hi = and the trivial fixed point (i.e. E;,Jxj] = 0). 

4.5 Discussion 

This chapter developed the Bethe-zeta formula for general inference families including multi- 
nomial and Gaussian families. The formula says that the determinant of the Hessian of the 
Bethe free energy function is the reciprocal of the graph zeta function up to positive factors. 
The underlying mathematical structure that makes the Bethe-zeta formula hold was the two 
dualistic variational characterizations of the LBP fixed points. In the proof of the formula, 
we utilized the languages of exponential families and the graph zeta function, including dual 
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convex functions and the Ihara-Bass type determinant formula. The key condition satisfied 
on the set L was Var& a [0j] = Var&J^j]. 

In Section 14.31 and 14.41 we discussed applications of the formula, demonstrating its 
utility. First, we applied the formula to analyze the positive definiteness of the Bethe free 
energy function, focusing on multinomial and fixed-mean Gaussian inference family. Our 
analysis showed that the region, where the Hessian of the Bethe free energy function is 
positive definite, shrinks as the pole of the Ihara zeta function a -1 approaches to zero. 
Secondly, we analyzed the stability of the LBP algorithm. At an LBP fixed point, the 
matrix Ai(u), appears in the first determinant formula of the graph zeta function, is equal 
to the linearization of the LBP update. This connection derives the relation between the 
local minimality and the local stability of the fixed point. Applying our spectral conditions, 
we gave a simple proof of the Yedidia's conjecture: locally stable fixed points of LBP are 
local minima of the Bethe free energy function. 

The Bethe-zeta formula shows that the Bethe free energy function contains information 
on the graph geometry, especially on the prime cycles. At the same time, the formula helps 
to extract graph information from the Bethe free energy function. We observed that some 
values derived from the Bethe free energy function are related to graph characteristics such 
as the number of the spanning trees. 

For a tree structured hypergraph, LBP algorithm and the Bethe free energy function 
are in a sense trivial; the graph zeta function is also trivial for a tree. The results of this 
chapter provide concrete mathematical relations between these trivialities. For example, 
Ch(u) = 1 implies that there is no pole, and thus q^ 1 = oo. Therefore, the Bethe free 
energy function is convex and the linearization T' at the unique fixed point is a nilpotent 
matrix, which is necessary for the finite step termination of the LBP algorithm. 



Chapter 5 



Uniqueness of LBP fixed point 

5.1 Introduction 

This chapter provides a new approach to analyze the uniqueness of the LBP fixed point. 
Since the LBP fixed points are the solutions of the LBP update equation, it is natural to ask 
whether there is a solution; if exist, is it unique? For multinomial cases, at least one fixed 
point exists because the Bethe free energy function is bounded from below and a stationary 
point of the Bethe free energy function is an LBP fixed point [136J. Furthermore, if the 
underlying hypergraph is tree or one-cycle, the solution is unique [128J; this result is obvious 
as we have shown the convexity of the Bethe free energy functions for these hypergraphs in 
Section 14.3.31 

From the viewpoint of approximate inference, the uniqueness of LBP fixed point is a 
preferable property. Since LBP algorithm is interpreted as the variational problem of the 
Bethe free energy function, an LBP fixed point that correspond to the global minimum is 
believed to be the best one. If we find the unique fixed point of the LBP algorithm, it is 
guaranteed to be the global minimum. 

For multinomial models, there are several works that give sufficient conditions for the 
uniqueness property. In [55], Heskes analyzed the uniqueness problem by considering equiv- 
alent minimax problem. Other authors analyzed the convergence property rather than the 
uniqueness. The LBP algorithm is said to be convergent if the messages converge to the 
unique fixed point irrespective of the initial messages. By definition, this property is stronger 
than the uniqueness. Tatikonda et al. [119] utilized the theory of Gibbs measure. They have 
shown that the uniqueness of the Gibbs measure implies the convergence of LBP algorithm. 
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Therefore, known sufficient conditions of the uniqueness of the Gibbs measure are that of 
the convergence of LBP algorithm. Ihler et al. [61] and Mooij et al. [87j derived sufficient 
conditions for the convergence by investigating conditions that make the LBP update a 
contraction. 

In this chapter, focusing on binary pairwise models, we propose a novel differential 
topological approach to this problem. Empirically, one of the strongest condition that is 
applicable to arbitrary graph is the spectral condition by Mooij et al. [87] . Our approach 
is powerful enough to reproduce the condition. For hypergraphs with nullity two, we prove 
the uniqueness property even though LBP is not necessarily convergent. Although our 
discussions in this chapter are restricted to binary pairwise models, the method can be 
basically extended to multinomial models. 

5.1.1 Idea of our approach 

In our approach, in combination with the Bethe-zeta formula, the index sum theorem is the 
basic apparatus. Conceptually, the theorem has the following form: 



where Index(?7) is +1 or —1 and determined by local properties of the Bethe free energy 
function at r/. The sum is taken over all fixed points of LBP, that is, all stationary points 
of the Bethe free energy function F. If we can guarantee that the index of any fixed point 
is +1 in advance of running LBP, we conclude that the fixed point of LBP is unique. 

The formula (|5.ip might look surprising, but such formulas, that connect the global 
and the local structure, are often seen in differential topology. The simplest example that 
illustrates the idea of the theorem is sketched in Figure 15. 11 In this example, the sum is 
taken over all the stationary points of the function and the indices are assigned depending 
on the sign of the second derivative at the points. When we deform the objective function, 
the sum is still equal to one as long as the outward gradients are positive at the boundaries 
(see Figure l5T2|) . This example suggests that the important feature for the formula is the 
behavior of the function near the boundary of the domain. 

Simsek et al. have shown a index sum formula, called generalized Poincare-Hopf the- 
orem |107j . In this formula, the indices of the stationary points in the (not necessarily 
smooth) boundary are summed as well as those in the interior. The theorem is applied to 
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Figure 5.1: The sum of indices is one. 



Figure 5.2: The sum of indices is still one. 



show the uniqueness of stationary point in non-convex optimization problems such as Nash 
equilibrium [107J and network equilibrium [1061 1118] . However, their theorem can not be 
applied to our Bethe free energy function because the behavior of the function near the 
boundary is so complicated that it is difficult to handle. 

We prove the index sum formula utilizing a property of the Bethe free energy function: 
the gradient of the Bethe free energy function diverges as a point approaches to the boundary 
of the domain. 

5.1.2 Overview of this chapter 

This chapter is organized as follows. In Section I5.2| we prove the index sum formula of 
the Bethe free energy function using two lemmas. The first lemma describes an important 
property of the Bethe free energy function: the divergence of the norm of the gradient 
vector at the boundary of the domain. The second lemma is a standard result in differen- 
tial topology. The index sum formula, combined with the Bethe-zeta formula, provides a 
powerful method of proving the uniqueness; we will prove the following two results utiliz- 
ing this method. Section 15.31 proves a uniqueness condition for general graphs, which is a 
reproduction of the condition by Mooij et al. |87j . In Section 15. A\ we focus on graphs of 
nullity two and prove that the fixed point of LBP is unique if it is not attractive. 

5.2 Index sum formula 

The purpose of this section is to show the index sum formula presented in the following 
theorem. Throughout this chapter, the inference family is binary pairwise and the given 
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graphical model is 

p(x) = — exp( ^2 J ij x i x j + ^ hi x i)> ( 5 - 2 ) 

ij&E i£V 

where G = (V, E) is a graph and Xi = ±1. 

Theorem 5.1. Let F be the Bethe free energy function on the set of pseudomarginals L. 
J/det V 2 F(r/) ^ is satisfied for all rj G (VF)- x (0), tfien 

]T sgn(detV 2 F(77)) = 1, (5.3) 
rje(VF)-i(o) 



w/iere 

1 ifx>0, 
-1 i/x < 0. 

VFe ca// eac/i summand, which is +1 or —1, the index of F at rj. 



sgn(x) := < 



Note that the set (Vi ? ) _1 (0), which is the stationary points of the Bethe free energy 
function, coincides with the set of fixed points of LBP. The condition det \7 2 F(rj) ^ in 
the statement is not a strong requirement. In fact, the Hessian is invertible except for 
a measure-zero region of L and an LBP fixed point does not happen to be in the region 
generally. 

The above theorem asserts that the sum of indices of all the fixed points must be one. 
As a consequence, the number of the fixed points of LBP is always odd. Not rigorously 
speaking, the formula implies the existence of LBP fixed points because if there is no LBP 
fixed point, the L.H.S. of Eq. (|5.3[) is equal to zero. A rigorous proof of the existence is 
given by bounding the Bethe free energy function from below [136J . 

This formula is generalized to multinomial models straightforwardly. For Gaussian 
models, however, this kind of formula does not hold. Indeed, even for one-cycle graphs, 
LBP fixed points do not exist if the compatibility functions are, in a sense, strong |93j. 



5.2.1 Two lemmas 



For the proof, we need two lemmas. The first lemma shows the divergent behavior of the 
gradient of the Bethe free energy function near the boundary of the domain. 
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Lemma 5.1. If a sequence {rj n } C L converges to a point 7]^ E dL, then 

\\VF( Vn )\\^oo, (5.4) 
where dL is the boundary of L C R^W^L 

Proof. Since our model is pairwise and binary, we choose the sufficient statistics as cj)^ (xi,Xj) 
XiXj and 4>i{xi) = X{ (xi E {±1}). We use notations rrii := r/j = Ebjxj] and Xij '■= = 
E& . [xiXj]. Then the Bethe free energy Eq. (|2.36p is rewritten as 

F({mi,Xij}) = - 2^ J vXij ~ 2^ +2^ 2^ ^ 4 J 

+^(1-*, ,(i±f£5i), (5.5) 

where T](x) := xlogx. The domain of F is written as 

L := |{mj,Xij} £ + mjXj + mjXj + Xij x i x j > f° r an U 6 an d Xj,Xj = 

The first derivatives of the Bethe free energy function are 

-hi + {l-di)- ^ Xjlogb^Xj) + T ^ Xjlogfe^x^Xfc), (5.6) 



a — = ~^i + 7 5^ log^i^x^Xj). (5.7) 

Note that it is enough to prove the assertion when hi = and Jjj = 0. We prove 
by contradiction. Assume that ||V-F(r/ n )|| -A 00. Then, there exists R > such that 
||V-F(i7 n )|| < R for infinitely many n. Let Bq(R) be the closed ball of radius R centered at 
the origin. Taking a subsequence, if necessary, we can assume that 

3 /k 



VF(r, n )^ d ( ) eBo(R), (5.8) 

because of the compactness of Bo(R). Let b^\xi,Xj) and 6^(xj) be the pseudomarginals 
corresponding to r] n . Since ?7 n — )• r]^ E <9L, there exist ij E E 1 , Xj and Xj such that 

b!fi\xi,Xj) -¥ 0. 
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Without loss of generality, we assume that Xi = +1 and Xj = +1. From Eq. (|5.8p . we have 



Ay. 



(5.9) 



Therefore fey (+, 



or — >■ holds; we assume bj^(+, • 



of generality. Now we have 



bf ) (+) = ^ ) (+,-) + ^ ) (+,+)^o. 



In this situation, the following claim holds. 



Claim. Let k € iVj. In i/ie Zzmii of n —> oo, 



without loss 



E 

Xi,Xfc=±l 



dn) 
J ik 

«.(")/ 



O+.W (+")&■ 



(«), 

ik 
dn) 
'ik 



dn), 



(5.10) 



converges to a finite value. 



proof of claim. From 6^(4-) -4 0, we have 



i n) (+--),i n) (+,+)^0 and ft[»>(- 



1. 



Case 1: &<?(-, +) &&(-,+) / and &<?(-,-) — ► %(-,-) ^ 0. 
In the same way as Eq. (|5.9p . 



VF(r/ r 



, _i log gMgM 



Aifc- 



Therefore 



i n) (+>-) 



V ^ 0. 



Then we see that Eq. (|5.10p converges to a finite value. 
Case 2: &£>(-, +) — ► 1 and &£?(--) — ► 0. 
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Similar to the case 1, we have 



4 n) (+,-) 



r ^ 0. 



Therefore j -»■ 0. This implies that X w ► 1. Then we see that Eq. (I5JTJD 

converges to a finite value. 

Case 3: — ► and ft^K -) — ► 1. 

Same as the case 2. □ 



Now let us get back to the proof of Lemma 15.11 We rewrite Eq. (|5.6p as 

VFfaJ, = ilog6f>(+) - 1 log + \ £ £ xaog ^;^ (5.11) 

From Eq. (|5.8p . this value converges to Kj. The second and the third terms in Eq. (|5.1ip 
converges to a finite value, while the first value converges to infinity. This is a contradiction. 
Therefore Lemma l5.1l is proved. □ 

The following lemma is a standard result in differential topology, and utilized in the 
proof of Theorem 15. 11 We refer Theorem 13.1.2 and comments in p. 104 of [32J for the proof 
of this lemma. 

Lemma 5.2. Let M\ and Mi be compact, connected and orientable manifolds with bound- 
aries. Assume that the dimensions of M\ and M2 are the same. Let f : M\ — > M2 be 
a smooth map satisfying f(dM\) C dMi. A point p £ Mi is called a regular value if 
det(V/((/)) 7^ for all q £ f~ 1 (p). For a regular value p £ Mi, we define the degree of the 
map f by 

deg/(p):= Yl sgn(detV/ (</)). (5.12) 
Then deg/(p) does not depend on the choice of a regular value p £ Mi. 

The simplest example of this formula is Mi = Mi = S . In this case, there is no 
boundary and / is just a smooth map from S 1 to itself. The degree of a map / is the 
winding number. 
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5.2.2 Proof of Theorem Q 

In this subsection, we prove Theorem 15.11 For the choice of the sufficient statistics and 
notations, see the proof of Lemma 15. II 
Define a map <3? : L — )• Rl^l+I^l by 

®{v)i = (1 - di)~ ^ Xilogbi(xi) + - ^ ^2 Xi\ogb ik (xi,x k ), (5.13) 

Xi=±l k&Ni Xi,x k =±l 

®(v)ij = J XiXj log bij(xi,Xj), (5.14) 

where bij(xi,Xj) and bi(xi) are given by rj = {rrii,Xij} € L. Therefore, we have VF = 
<3? - (j) and V<I> = V 2 F. Then following claim holds. 

Claim. Under the assumption of Theorem \ 5.1l the sets < l )_1 ((j)), < 1 )_1 (0) C L are finite 
and 

^2 sgn(det V$(t7)) = ^ sgn(det V$(r])), (5.15) 
T 7 e*- 1 (j) rje^-Ho) 

holds. 

Before the proof of this claim, we prove Theorem 15.11 under the claim. 

Proof of Theorem \5.1\ From Eq. (|5. 13[) and (|5.14p . it is easy to see that = 44> rj = 

{mi = 0, Xij = 0}. Indeed, $>(r))ij = is equivalent to 

(1 + mi + mj + Xy)(l ~ m - rrij + Xij) = (1 - m + mj - XiiX 1 + m i _ m i ~ Xij) 

and thus x» = rmmj- Plugging into this relation into Eq. (j5. 13j) . one observes that rrii = 
Xij = 0. Moreover, at this point rj = {mi = 0, Xij = 0}, V<1> = V 2 F is a positive definite 
matrix because of Lemma 14.21 Therefore the RHS of Eq. (|5.15p is equal to one. The LHS 
of Eq. ||525) is equal to the LHS of Eq. (|53]) . because rj G ^ VFfa) = 0. Then 

the assertion of Theorem 15.11 is proved. □ 

Proof of the claim. First, we prove that = (^-F) _1 (0) is a finite set. If not, we 

can choose a sequence {f] n } of distinct points from this set. Let L be the closure of L. 
Since L is compact, we can choose a subsequence that converges to some point 6 L. 
From Lemma 15.11 rj^, £ L and Vi ? (g*) = hold. By the assumption in Theorem 15. 1\ we 
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have det V 2 F(r)^) ^ 0. This implies that VF(rj) ^ in some neighborhood of r]^. This is 
a contradiction because ij n — )• rj^. 

Secondly, we prove the equality (|5.15p using Lemma HT2l Define a sequence of compact 
convex sets C n := {rj £ L\ X^jgE Yl Xi x ■ ~ l°S^ij — n li which are smooth manifold with 
boundary and increasingly converge to L. Since <3? _1 (0) and ar e finite, they are 

included in C n for sufficiently large n. Take K > and e > to satisfy K — e > ||(j)||- 
From Lemma 15. 1\ we see that &(dC n ) n Bq(K) = <j) for sufficiently large n. Let n a be 
such a large number. Let II e : Rl^l+I^l — y Bq(K) be a smooth map that is identity on 
Bq(K — e), monotonically increasing on ||x||, and n e (x) = for ||x|| > K. Then we 

obtain a composition map 

$ := n e o $ : C no -> BoOFO (5.16) 

that satisfy $(<9C no ) C dB (K). By definition, we have fc-^O) = $ _1 (0) and = 
$ _1 (j). Therefore, both and (j) are regular values of From Lemma 15.21 we have 

sgn(det V$(g)) = ^ sgn(det V$(»7)). 
Then, the assertion of the claim is proved. □ 



5.3 Uniqueness of LBP fixed point 

This section gives a short derivation of a uniqueness condition of LBP on general graphs, 
exploiting the index sum formula and the Bethe-zeta formula. This condition is a reproduc- 
tion of the Mooij's spectral condition, though the stronger convergence property is proved 
under the same condition in [87J. For binary pairwise case, numerical experiments in |87j 
suggests that Mooij 's condition often superior to conditions in [551 11191 EH] • 

To assure the uniqueness in advance of running LBP, we need a priori information on the 
LBP fixed points. The following lemma gives such information on the correlation coefficients 
of the beliefs. 

Lemma 5.3. Let fiij be the correlation coefficient of a fixed point belief bij. Then 

< tanh(|Jij|) and sgn(/%) = sgn(Jy). (5-17) 
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Proof. Since the belief is given by Eq. (|2,25p . we see that 



bij^Xi, ^ 6Xp( e /jjX ) ;Xj ~\~ OiXi ~\~ OjXj^j 



(5.18) 



for some 9i and 8j. After a straightforward computation, one observes that 



sinh(2 Jij) 



(5.19) 



hj — 



^cosh(26' i ) + cosh(2Ji,) y^osh^-) + cosh(2J^) ' 



The bound is attained when 9i = and Oj = 0. 



□ 



Corollary 5.1 (|87J). Let t := {tanh(| Jj e ] |)} eg £ he a set of directed edge weights of G = 
(V, E). If p(M(t)) < 1, then the fixed point of LBP is unique. 

Proof. Let /3 = {/3[ e ]} e ^. Since |/%| < tanh(| Jy|), we have p{M{(3)) < p(M(t)) < 1 
(Theorem lA.il in Appendix A). From Lemma [4. 31 we have det(J — A^(/3)) = det(I — Ai(u)). 
Therefore, from the Bethe-zeta formula, det(V 2 i ? ) is positive for any fixed point of LBP. 
The index sum formula implies that the fixed point is unique. □ 

5.4 Uniqueness result for graphs with nullity two 

In this section we focus on graphs with nullity two and show the uniqueness of the LBP 
fixed point for unattractive interactions. In the proof of the above corollary, we only used 
the bounds on the moduli of correlation coefficients. In the following case of Corollary 15. 2| 
we can utilize the information of signs. 

To state the corollary, we need a terminology. Two interactions {Jij, hi} and {J^h^} 
are said to be equivalent if there exists (s^ 6 {±l} y such that = JijSiSj and h\ = hiSi. 
Since an equivalent model is obtained by a gauge transformation Xi — > XjSj, the uniqueness 
property of LBP for equivalent models is unchanged. Recall that a given model {Jij, hi} is 
attractive if Jij > for all ij G E. 

Corollary 5.2. Let G be a connected graph with nullity two, and assume that the interaction 
is not equivalent to attractive interactions, then the LBP fixed point is unique. 

Interactions that are not equivalent to attractive interactions are sometimes referred to 
as frustrated. For attractive interactions, there are possibly multiple LBP fixed points [88J. 
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Figure 5.3: The graph G. 



Figure 5.4: The graph G. 



For graphs with cycles, all the existing a priori conditions of the uniqueness upper 
bound the strength of interactions essentially and does not depend on the signs of Jij 
|87l l55| I119t I61| . In contrast, Corollary I5.2I applies to arbitrary strength of interactions if 
the graph has nullity two and the signs of Jij are frustrated. 

It is known that the LBP fixed point is unique if the graph has nullity less than two. 
If the graph is a tree, the fixed point is obviously unique. For graphs with nullity one, 
the uniqueness is deduced from the uniqueness of the Perron-Frobenius eigenvector, which 
correspond to the fixed point message [128J. The result is easily understood form the 
variational point of view; in these cases, the Bethe free energy function are convex from 
Theorem 14. 41 Compared to these cases, the uniqueness problem for graphs with nullity two 
is much more difficult. If we write down the fixed point equation of the messages, it is a 
complicated non-linear equation. In such approaches, it is hard to directly figure out that 
the fixed point is unique. 

5.4.1 Example 

In this subsection, we show an example to illustrate how to apply Theorem 14. 1[ Theorem 
15.11 and Lemma 15.31 to prove the uniqueness. The complete proof of Corollary 15.21 is given 
in the next subsection. 

Let V := {1, 2, 3, 4} and E := {12, 13, 14, 23, 34}. The interactions are given by arbitrary 
{hi} and {— J12, J13, Ju, J23, ^34} with > 0. See Figure loTTl The + and — signs represent 
that of two body interactions. For the uniqueness of LBP fixed point, it is enough to check 
that 



for arbitrary < /?i3 , /?23 , /?14 ; ^34 < 1 and — 1 < P12 < because of Theorem 14.11 and 



det(I-M((3)) > 



(5.20) 
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Theorem 15.11 The graph G in Figure HP1 is obtained by erasing vertices 2 and 4 in G. To 
compute det(J — M((3)), it is enough to consider the smaller graph G. In fact, 



det(J - M(0)) = Cg(O)- 1 

= n &-9(p)) 
= n a-^p)) 

= ( d 0)- 1 =det(I-M(P), 



(5.21) 
(5.22) 



where /3 ei := P12P23, $e 2 ■= /?i3, Pe 3 ■= PuPu and /3 ei = /3 e - (e* is the opposite directed 
edge to ej). The equality between Eq. (|5.2ip and (|5.22p is obtained by the one-to-one 
correspondence between the prime cycles of G and G. By definition, we have 
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where the rows and columns are indexed by e\, &2i 63, ei, e2 and 63. Then the determinant 
is 
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Since —1 < /3 ei < and < /3 e2 ,/3 e3 < 1, we conclude that this is positive. 
5.4.2 Proof of Corollary I5T21 

In this subsection, we prove Corollary 15.21 by classifying the graphs with nullity two. There 
are two operations on graphs that do not change the set of prime cycles. The first one is 
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Figure 5.5: Two other types of graphs. 
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Figure 5.6: List of interaction types. 



adding or erasing a vertex of degree two on any edge. The second one is adding or removing 
an edge with a vertex of degree one. With these two operations, all graphs with nullity two 
are reduced to three types of graphs. The first type is in Figure 16.21 The other two types 
are in Figure [531 



Up to equivalence of interactions, all types of signs of two body interactions are listed 
in Figure [5761 except for the attractive case. We will check the uniqueness for each case in 
order. As discussed in the previous example, all we have to do is to prove 

det(I-M(/3)) >0 (5.23) 

for correlation coefficients /3 in a certain region. 
Case (1): Proved in Subsection 15.4.11 
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Case (2): In this case, 
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where rows and columns are labeled by e l5 &2i C3> ei, &i and e3. Then the determinant is 
det(7 -M(/3)) = (1 - /3 ei )(l - /3 £3 )(1 - /3 ei - /3 ea + /3 ei /3 e3 - 4/3 ei /3 e 2 2 /3 e3 ). (5.24) 



This is positive when < /3 ei ,/3 e2 < 1 and — 1 < /3 ea < 0. 

Case (3): The determinant Eq. (|5.24p is also positive when < (3 e2 < 1 and —1 < 
Case (4): In this case, 
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where rows and columns are labeled by e\,e2,e\ and &i- Then we have 



det(J - M((3)) = (1 - p ei )(l - (3 e2 )(l - p ei - /3 e2 - 3/3 ei /3 e2 ). (5.25) 
This is positive when < /3 ei < 1 and — 1 < /3 e2 < 0. 

Case (5): The determinant Eq. (j5.25|) is positive when —1 < /3 ei ,/3 e2 < 0. 



5.5 Discussion 



This chapter developed a new differential topological method for investigating the unique- 
ness of the LBP fixed point. Our method is based on the index sum formula, which states 
that the sum of indices at the LBP fixed points is equal to one. From this formula and the 
Bethe-zeta formula, the uniqueness is proved if det(/ — M((3)) > at all LBP fixed points. 

Applying this method, we proved the uniqueness under Mooij's spectral condition |87t 
188] , Our proof gives an interpretation why the directed edge matrix M appears in a 
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sufficient condition for the uniqueness. 

We also showed the uniqueness for graphs with nullity two and frustrated interactions. 
Though the computation of the exact partition function on such a graph is a feasible prob- 
lem, the proof of the uniqueness requires involved techniques. Indeed our result implies that 
certain class of non-linear equations have the unique solutions. It is noteworthy that our 
approach is applicable to graphs that have nullity greater than two. For example, let B% 
be the bouquet graph, which has the unique vertex and three self loop edges. Applying our 
method for graphs homeomorphic to B3, we can straightforwardly show the uniqueness if 
the signs of the two body interactions are (+, — , — ). It may be mathematically interesting 
to find a class of edge-signed graphs that are guaranteed to have the unique LBP fixed point 
with the prescribed signs of two body interactions. 

For the first application, in Corollary 15. 1| we only used a priori bounds for moduli of 
correlation coefficients while, in Corollary 15.21 we also used information of signs but the 
scope was restricted to graphs with nullity two. It would be interesting if we can utilize 
both information for general graphs and show the uniqueness under stronger condition than 
the Mooij's spectral condition. 

In this chapter, for simplicity, we focused on pairwise binary models. However, our 
method can be extended to multinomial models; the index sum formula is generalized to 
multinomial cases in a straightforward way. Combining with the Bethe-zeta formula, we can 
show the uniqueness of the LBP fixed points under some conditions in analogous manners. It 
would be interesting to compare the uniqueness condition for multinomial models obtained 
by this method and those of obtained in previous researches [STJ [SSJ 11191 EI] ■ 
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Chapter 6 

Loop Series 



6.1 Introduction 

Given a graphical model, the computation of the partition function and marginal distribu- 
tions are important problems. For multinomial models, evaluations of these quantities on 
general graphs arc NP-hard [8, 29j, but efficient exact algorithm is known for tree-structured 
graphs. Extending this efficient algorithm, loopy belief propagation, or the Bethe approxi- 
mation, provides an empirically good approximations for these problems on general graphs. 
Theoretical understanding of these approximation errors is important issue because it would 
provide practical guides for further improvement of the method. Since the method is exact 
for trees, the error should be related to cycles of graphs. 

In this line of researches, Ikeda et al. [62} [63] have developed perturbative analysis 
of marginals based on information-geometric methods. In their analysis, cycles yield non- 
zero m-embedded curvature of a submanifold called E* and this curvature produces the 
discrepancy between the true and the approximate marginals. For pairwise models, another 
approaches for corrections of the Bethe approximation is considered by Montanari and 
Rizzo |86j . In their method, correlations of neighbors, which are neglected in the Bethe 
approximation, are counted. Parisi and Slanina [97] also derive corrections based on field- 
theoretic formulation of the Bethe approximation. 

In this chapter, different from the aforementioned approaches, we investigate an expan- 
sion called Loop Series (LS) introduced by Chertkov and Chernyak [JU [25]. In the loop 
series approach, the partition function and marginal distributions of a binary model are 
expanded to sums, where the first terms are exactly the LBP solutions. Therefore, the 
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analysis of the remaining terms is equivalent to the quality analysis of the LBP approxi- 
mation. The most remarkable feature of the expansion, which is not achieved by the other 
approaches, is that the number of terms in the sum is finite. In the expansion of the par- 
tition function, all terms are labeled by subgraphs of the factor graph called generalized 
loops, or sub-coregraphs, allowing us to observe the effects of cycles in the factor graph. The 
contribution of each term is the product of local contributions along the subgraph and is 
easily calculated by the LBP output. 

The number of terms in the loop series is exponential with respect to the nullity of the 
graph, so the direct summation is not feasible. (The number of sub-coregraphs is discussed 
in the next chapter.) However, it provides a theoretical background to understand the 
approximation errors and ways of correcting the bare LBP approximation. 

The remainder of this chapter is organized as follows. Section T6.2I derives the LS of the 
partition function in our notation. In an analogous manner, we also expand the marginal 
distributions. Section [6731 presents applications of the LS. In section l6~4"l we discusses the 
LS in a special case: the perfect matching problem. 



6.2 Derivation of loop series 

In this section, we introduce the LS initiated by Chertkov and Chernyak [25} 124] . The 
scope of the method is the binary multinomial models. Though our notations of the LS is 
different from |25l I24j . it is essentially equivalent to theirs. Our representation of the LS is 
motivated by the graph polynomial treatment of the LS in the next chapter. 

Beforehand, we review the setting. Assume that the given graphical model is 

aeF 

where H = (V,F) is the factor graph, ^ a is a non-negative function, x = and 
Xi E {±1}- We perform inferences by the LBP algorithm using the binary multino- 
mial inference family on H. After the LBP algorithm converged, we obtain the beliefs 
{b a (x a ), bi(xi)} a £F,i£V an d the Bethe approximation of the partition function Zb, which is 
the starting point of the loop series. 
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6.2.1 Expansion of partition functions 

The aim of this subsection is to show and prove the following theorem. We define a set of 
polynomials {f n (x)}^L. inductively by the relations fo(x) = l,fi(x) = and f n+ i(x) = 
xfn(x) + fn-i(x) (n > 1). Therefore, f2(x) = l,fs(x) = x and so on. Moreover, f n {—x) = 
(— l) n f n (x) and the coefficients of f n (x) are non-negative integers. 

Theorem 6.1 (Loop series expansion). Let H = (V,F) be the factor graph of the given 
graphical model and let Bh = (V U F,Eb h ) be the bipartite representation of H . Assume 
that LBP converges to a fixed point and obtains Zb and {b a (x a ), bi(xi)}. Then the following 
expansion of the partition function holds. 

Z = Z B Y, K s )> 'GO : = (-!) W II #.(.) II /*w(-y0. ( 6 - 2 ) 

where mi = E b .[xi], 

ie/ \J 1 - m? 

2mi (P. A\ 

1% ■= , (6.4) 

Note that, for s C Eb h , I a {s) C V is the set of vertices which are connected to a by edges 
of s, and di{s) is the number of factors connected to i £V by the edges of s. 

In Eq. (j6.2|) . there is a summation over all subsets of Eb h - An edge set s C Eb h is 
identified with the spanning subgraph (V U F,s) of Bh. Since fi(x) = and /3?U = 0, a 
subgraph s makes a contribution to the summation only if s has neither vertices nor factors 
of degree one. Therefore, the summation is over all coregraphs of the forms (V U F, s); we 
call them sub- coregraphs. In relevant papers, such subgraphs are called generalized loops 
[231 [25] or closed subgraphs [90], [91] . Figure I6TT1 and 1631 give an example of a hypergraph 
and its sub-coregraphs. In this example, there are five sub-coregraphs. The number of 
sub-coregraphs will be discussed in Section 17.31 

Each summand r(s) of the expansion is easily calculated by the resulting beliefs. Actu- 
ally, /3f is the multi-correlation coefficient of variables in I and j{ is the scaled bias of the 
variable X{. Both of them are efficiently calculated and r(s) is a product of them. 
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Figure 6.1: A hypergraph. 



Figure 6.2: The list of sub-coregraphs. 



The contribution of the empty set is r(0) = 1 because fo(x) = 1 and /3Sf = 1. Therefore, 
the "first" term of the loop series expansion Eq. (|6.2p is Zb- In this sense, the LS is an 
expansion from the Bethe approximation. 

Note that, for pairwise case, there is an understanding of the loop series from a view 
point of message passing schemes |127j . The correlation coefficients (3ij are the second 
eigenvalues of the message transfer matrices. 

The proof of Theorem 16. II is divided into two lemmas. The first lemma provides a com- 
pact characterization of the ratio of the true partition function and its Bethe approximation. 
The second lemma is the key identity for the loop series expansion. 



Lemma 6.1. 



Z 
~Zb 



e nif^n^). 

ce{±1} v ctEF Hfea°»W i&V 



(6.5) 



Proof. From the definition of Zb in Subsection 12.3.31 we have log Zb = Yla^F' l i ) a{& a ) + 
X) ieV (l ~ di)i)i(0i). From Eq. (12.4212.421) and Assumption El the condition Eq. (l2Hjl 
comes to 

Taking the sum over (xj)j e y, we obtain the asserting equation. □ 



Lemma 6.2. For each factor a E F and Ida, we introduce an indeterminate f3f, where 
we use notation /3? = 1 and (3f = if \I\ = 1. The following identity holds. 



EIIE ff^C" ) • • • KC* ) II jtft 



(6.7) 



sCE B , 



aGF 



i&V 
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where I = . . . , i^}. 
Proof. 

(l.h.s.)=y, e nw.<.> n feroin^ 

sC% a£F ieV Xi=±l ?l ~ l ~ 

( t.\-di(s)+X >t<k{s)-l 

= e (-n'-'nff.wn "'" 6 ', — ■ 

In the first equality, we took the sum Ida out of the product a G F. On the other hand, 
by the definition of f n , we have 

Then the identity is proved. □ 

Proof of Theorem \6.1[ Let {Pf}ica,\i\>2 be given by Eq. (|6.3j) and let be given by 

Eq. (|6.4p and 7, = & — Then b a (x a ) has the following form: 

baM = - 1 r E^II^ II ( 6 - 9 ) 

Indeed, from the Eq. (j6.9fl . we can check the 2'"' — 1 conditions Eq. (j6.4!6.3|) . which determine 



L.H.S. of Lemma 16.21 Accordingly, the expansion formula is proved. □ 



b a completely. Using bi(xi) = » 1 , we see that the R.H.S. of Lemma RTTl is equal to the 



6.2.2 Expansion of marginals 

In this subsection, we expand the true marginal distribution px^i) := Ylx^iP( x ) ra ther 
than the partition function. For the sake of simplicity, we write down the expansion of 
Pi(+1) — pi(— 1) rather than pi(+l) or 1). Since the variable is binary, this is enough. 
We define a set of polynomials {g n (x)}^ =0 inductively by the relations go(x) = x, gi{x) = —2 
and g n+ \{x) = xg n {x) + g n -\(x). Therefore, gz(x) = —x, g%{x) = —x 2 — 2, and so on. 
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Theorem 6.2. In the same situation as Theorem \6.1l the following expansion holds. 

k M raTrf = £ ( - 1)|s| n«L W n w*) «* W (7i). (6.io) 

Proof. The proof proceeds in a similar fashion to the proof of Theorem 16.11 Analogous to 
Lemma |6. 11 we have 

Z . , , /■ i\\ TT b a (x a ) 



( P1 (+i)-p 1 (-i))= 53 ^ n TThr i iiW' (6-ii) 



which is obtained by Eq. (|6.6p . On the other hand, using 



xi=±l 

we obtain a similar identity to Lemma 16.21 That is 



(6.12) 



(6 + ^r 1 ) E^IIE ff^C 1 ) • • • KC* ) II 

= e (- i ) |s| n^?«w n /•fcww^wCTi)- 

sCE B[I a£F ieVx{l} 



(6.13) 



Combining these equations and (£1 + ^ X ) = (&i(+l)&i(— 1)) 1 ^ 2 , the theorem is proved. □ 

By definition, the sum in the expansion is taken over all the subgraphs s = (V U F, s) of 
Bjj- The subgraph s contributes to the sum only if no vertices nor factors are degree one 
except for the vertex 1 € V. 

The "first" term r(0) is equal to 50(71) = 7i = _ Therefore, omitting the 

remaining terms, we obtain an approximation 

argmaxpi(xi) « argmax6i(xi). (6-14) 

Xl=±l Xl=±l 

The assignment x\ that maximize the marginal probability distribution p\{x\) is called 
Maximum Posterior Marginal (MPM) assignment. The above argument suggests that the 
Bethe approximation, which is obtained by taking the first term, for the MPM problem of 
Pi is given by the MPM of b\. 
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MPM problems of marginal distributions are especially important in the application of 
error correcting codes. In such applications, the receiver wants to infer the sent bits rather 
than the probabilities. 

6.3 Applications of LS 

In this section, we discuss applications of the loop series. The first subsection provide our 
application of the LS and The next subsection reviews results of other authors. 

6.3.1 One-cycle graphs 

In [128j . it is shown that if the graph has the unique cycle and the concerning node 1 G V 
is on the cycle, then the assignment that maximize the belief b\ gives the exact MPM 
assignment. Using Theorem 16.21 we can easily show the result as follows [127J. 

Theorem 6.3. Let G = (V, E) be a graph with a single cycle with the node 1 £ V on it. 
(See Figure 0O1 for example.) Then, pi(l) — Pi(— 1) and &i(l) — &i( — 1) have the same sign. 

Proof. In the right hand side of Eq. (|6.10p , only two subgraphs s are contribute to the sum: 
the empty set and the unique cycle. From 50(71) = 7i > 52(71) = — 7i and I < 1) we 
see that the sum is positively proportional to 71. □ 

If 1 is not on the unique cycle, this property does not hold. In this case, three types of 
subgraphs appear in Eq. (|6.10|) . 

6.3.2 Review of other applications 

Attractive models: A notable feature of the Bethe approximation of the partition function 
is that, for certain classes of models, it lower bounds the true partition function, i.e., Zb < 
Z. As shown in |113j . this fact is deduced utilizing the LS for "attractive models": a 
subclass of binary multinomial models. Their definition of attractive models coincides with 
the condition Jij > for pairwise binary case. In fact, if one of the following conditions 

1. 7, > for all i G V, and (-l) |7| /3f > for all / C a 



2. 7i < for all i £ V, and j3f > for all J C a 
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holds, Z/Zb is obviously greater or equal to one. 

Planar graphs: In |26j, Chertkov et al. have shown that the partial sum of the loop 
series over the one-cycle sub-coregraphs reduces to evaluation of the partition function of 
the perfect matchings on an extended, 3-regular graph. Weights of the perfect matchings 
are easily calculated by the LBP output. If the graph is planar, then the extend graph is 
also planar by construction and the computation of the perfect matching partition function 
reduces to a Pfafhan. Thus, the partial sum of the loop series is computed by a tractable 
determinant. Moreover, they find that the entire LS is reducible to a weighted Pfaffian 
series, where each Pfaffian is a partial sum of the loop series. 

In |46j . Gomez et al. have proposed an approximation algorithm for partition functions 
on planar graphs based on the above result. Experimental results are presented for planar- 
intractable binary models, showing significant improvements over LBP. 

Independent sets: The LS framework is utilized to analyze the performance of the Bethe 
approximation for counting independent sets. An independent set is a set of vertices such 
that there is no edge between any two of the vertices. In [22], Chandrasekaran et al. estab- 
lished that for any graph G = (V,E) with max-degree d and girth larger than 8d\og 2 |V|, 
the multiplicative error decays as 1 + 0(|T^|~ 7 ) for a certain 7 > 0. 

6.4 Special Case: Perfect matchings 

In this section, we apply the LS technique to a special class of problems: the partition 
function of perfect matchings. As we show in Theorem [631 the loop series expansion has an 
interesting form in this case. First, we introduce the partition function of perfect matchings 
using language of graphical models. Then we describe the Bethe approximation in this 
specific case. 

Let G = iy,E) be a graph with non-negative edge weights w = {wij}ij£E- A matching 
D of G is a set of edges such that any edges do not occupy a same vertex. A matching is 
perfect if all the vertices are occupied. The partition function of the perfect matchings is 
given by 




(6.15) 



Dipcrfoct ijED 
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where the sum is taken over all the perfect matchings. (An extension of this class of partition 
functions called monomer- dimer partition functions will appear in Subsection I7.5.2P This 
partition function is formulated by a graphical model over edge binary variables a = {aij = 
0, l}ij£E- A perfect matching D is identified with a that satisfy SjeiV; °X? = ^ f° r a ^ ^ G 
Let us define 

'ijo : if T,jeNi a ij = 1 and a ijo = !) 
: otherwise, 



where cjj = {<7jj}j g Ar.. In fact, Eq. (|6.15|) is the normalization constant of a graphical model: 

P^ = zL)U^- ( 6 - 16 ) 

i 

This is a probability distribution over all the perfect matchings. Note that the corresponding 
factor graph H has the vertex set and the factor set identified with E and V, respectively. 

We apply the Bethe approximation and loop series method to this partition function. 
Since the functions have zero values, the domain of the Bethe free energy function F is 
restricted. We choose parameters Vij = bij(l), then bi is determined by uy: 



bi((Ji) 



f 

%o if Ejejv, °u = 1 and a m = 
otherwise. 



Therefore, the Bethe free energy function is significantly simplified in our case: 

F ( v ) = Vi i lo s( w *i) + ^2 { Vi i lo S% - (! - %') iogC 1 - %)} > ( 6 - 17 ) 
where the domain of this function is given by 

U := {v = {vij} mE \ v tj > 0, £ "ij = 1 ^ V}. (6.18) 

To analyze the stationary points of the Bethe free energy function, which gives the Bethe 
approximation, it is useful to introduce the following Lagrangian 



C(v, fi) = F(v) -][> + l) £%-l)- (6-19) 
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Looking for a stationary point of Eq. (|6.19p over the v variables, one arrives at the following 
set of quadratic equations for each variables Uy 

Vij(l — v^) = Wij exp (fii + fMj) for all ij £ E. (6.20) 

We call a point v E U that satisfy this equation LBP solution. At an LBP solution, the 
Bethe approximation of the partition function is given by Eq. (|6.17p and F(v) = — logZg. 

6.4.1 Loop Series of perfect matching 

Theorem 6.4. Let v = {v^} be an LBP solution and Zb be the Bethe approximation. 
Then the following expansion hold. 

Z = Z B ^r(s), r(*) = n(l-*(*))niZ^' ( 6 " 21 ) 

sCE ieV ij&s lJ 

Proof. We transform the 0,1 variables &ij to ±1 variables by = 1 — 2<r.y. Then 
mij = Ei^. [xij] = 1 — 2vij. For a factor a = i (i.e. the factor corresponding to the vertex i 
of the original graph) and I C a, one derives 



j% 3 



=d-i/i)ny r 

je/ v 

Then the expansion Eq. (|6.2ip is deduced easily. □ 

In [90], Nagle derived an expansion of monomer-dimer partition function, which reduces 
to an expansion of the perfect matching partition function if the monomer weights are zero. 
Our expansion Eq. (|6,2ip extends his reduced expansion, where only regular graphs and 
uniform edge weights cases are discussed. Note that a similar (1 — di(s)) type loop series 
also appear in the definition of the omega polynomial, which we will discuss in Section 17.51 

For given non-negative square matrix A of size N, the permanent of A is equal to the 
partition function of the perfect matchings on the complete bipartite graph of size and 
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with edge weights A™. In [27], Chertkov et al. discuss this permanent problem and use 
the loop series Eq. (|6.2ip to improve the Bethe approximation. It is noteworthy that, 
empirically, the Bethe approximation lower bounds the true permanent. 

6.4.2 Loop Series by Ihara-Bass type determinant formula 

The aim of this subsection is to demonstrate the importance and ubiquity of the graph 
zeta function in the context of LBP and the Bethe approximation. More precisely, we give 
another proof of Theorem [63] based on the Ihara-Bass type determinant formula. Analogous 
to the proof of Theorem 16. 11 the proof proceeds in two steps. First, in Lemma [6. 3 \ we give a 
compact representation of the ratio of the partition function. Secondly, we prove an identity 
involving an average of determinants. 

Lemma 6.3. For an LBP solution v = {v^}, the following is true 

Z(w) 



Z{v.*(l-v))\{(l-v l3 )-\ (6.22) 



Z B 

where A. * B marks the element by element multiplication of the two matrices, A and B of 
equal sizes. 

Proof. From the definition of the Bethe free energy Eq. (|6.17p . and conditions for LBP 
solutions Eqs. (|6.18l6.20p . one derives 



z B = n a - %•) n L.n-' ..J 8J = n - n ex p(-^)- ( 6 - 23 ) 



On the other hand Eq. (|6.20p results in Z(w ) = Z(y . * (1 — v)) Yli^v ex P( — Z 2 *)- Combining 
the two equations, we arrive at Eq. (|6.22p . □ 

Note that this lemma, representing the ratio in terms of another partition function of 
perfect matchings, reminisce Lemma 16. II 

Lemma 6.4 (LS as average of determinants). Let E be the set of directed edges of the graph 
G = (V, E) and x = {xi-^j),^.^^ be a set of random variables that satisfies Ei[xi^j\ = ; 
E,[xi^jXj->i\ = 1 and ~E[xi^jXk^i] = ({k,l} 7^ {hj})- (Here and below E a [---] stands 
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for the expectation over the random variables x.) Then the following relation holds; 
£ - di( 8 )) J] j^- = E x [det[I - iVM}}, 

where V = diag(y / t> i: ,-/(l - Vi/jx^j). 

Proof. Expanding the determinant, one derives 

n 

det[I-iVM] = det - M l{ex,..,e„}(-0 n II( V )-^r 

{ei,...,e n }<ZE 1=1 

Evaluating expectation of each summand, one observe that it is nonzero only if (i — > j) E 
{ex, . . . , e n } implies (j — >• i) € {ei, . . . , e n }. Thus we arrive at 

E x [det[/ - tVA*]] = ^(-l) |s| detM| s J] = ^ r(s), 

sCE ij&s tJ s<ZE 

where Ai\ s is the restriction to the set of directed edges obtained by s. In the final equality, 
we used the formula det M = (— 1)'^' EIiev(^ — ^»)> which is proved in Theorem l3.3|, for the 
subgraph s. □ 

The determinant in the expectation is nothing but the reciprocal of the graph zeta 
function. As we show below, we obtain the loop series of Theorem 16.41 by applying the 
Ihara-Bass type determinant formula. 

Proof of Theorem \6.J\ by Lemma \6.3\ and \6.4\ 

We use Lemma [6.41 in a case that the random variables Xij = x^j = Xj^i taking ±1 values 
with probability 1/2. Using the Ihara-Bass type determinant formula Eq. (J37T7J), one derives 

det[J - iVM] = det A JJ (1 - uy) -1 , 



where 



Aij — y Vij(l Vij) Xij. 
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Therefore 



r(s) = E x [det[I - iVM}} = E^deti] ]J (1 - v tj ) 



i 




=z(v.*(i-v))H(i 




-i 



ijdE 



Z(w) 



Z B 



□ 



6.5 Discussion 

This chapter developed an alternative method for deriving and expressing the LS for the 
partition function of binary models. The loop series expansion of marginal distributions is 
also developed. 

The form of the loop series expression reflects, in some sense, the geometry of the factor 
graph. In fact, utilizing the LS, we showed that the MPM assignment is exact for 1-cycle 
graphs. In this proof, restriction on the appearing subgraphs was essential. Such graph 
geometric viewpoints are further discussed in the next chapter, treating the LS as a graph 
polynomial. 

The loop series is not independent from the graph zeta functions. Indeed, for the perfect 
matchings problem, the loop series is also derived from the Ihara-Bass type determinant 
formula as discussed in Subsection 16.4.21 This result suggests deep connections between 
LBP, the Bethe free energy and the graph zeta function. However, we do not know how 
to derive the general loop series expansion based on graph zeta techniques. It would be 
interesting to find such a derivation, elucidating the relation between the loop series and 
the graph zeta function. 



Chapter 7 

Graph polynomials from LS 



7.1 Introduction 

This chapter treats the Loop Series (LS) as a weighted graph characteristics called theta 
polynomial @g(/3,7). Since the LS is the ratio of the partition function and its Bethe 
approximation, elucidating mathematical structures of the LS are worth interest. In this 
chapter, we only discuss the binary pairwise models. 

Our motivation for the graph polynomial treatment of the LS is to "divide the problem 
in two parts." The loop series is evaluated in two steps: 1. the computation of (3 = (P%j)ijeE 
and 7 = (7i)iev by an LBP solution; 2. the summation of all the contributions from the 
sub-coregraphs. Since it seems difficult to derive strong results on the first step, we intend 
to focus on the second step. If there is an interesting property in the form of the loop series 
sum, or the 0-polynomial, the property should be related to the behavior of the error of 
the partition function approximation. 

For example, if the graph is a tree, the 0-polynomial is equal to one because there 
are no sub-coregraphs in trees. This fact implies that the Bethe approximation gives the 
exact values of the partition functions on trees. Another notable success, in this line of 
approach, is the proof of Z > Zb for attractive models with means biased in one direction 
|113j . The result can be understood by the property of G^: the coefficients of 0(j(/3,7) are 
non-negative. (See Subsection 16.3.21 ) 

Though we have not been successful in deriving properties of Og(/3,7) that can be used 
to derive unproved properties of the Bethe approximation, we show that the G-polynomial 
has an interesting property called deletion-contraction relation if the vertex weights ji are 
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set to be the same. We further analyze the bivariate graph polynomial #g(/3,7), which 
is obtained as the two- variable version of Gg(/?,7), and the univariate graph polynomial 
w g(/3)> which is obtained from #g(/?>7) by specializing 7 = 2y/—l and eliminating a factor 
(1 - E \~\ V \. We believe that our results give hints for future investigations on the 0- 
polynomial. 

7.1.1 Basic notations and definitions 

In the first place, we review basic notations and definitions on graphs following Subsection 
12.1.11 For clarity, we summarize them for the case of graphs, not hypergraphs. Let G = 
(V, E) be a finite graph, where V is the set of vertices and E is the set of undirected edges. 
In this chapter, a graph means a multigraph, in which loops and multiple edges are allowed. 
Note that, in graph theory, a loojO\ is an edge that connects a vertex to itself. A subset s of 
E is identified with the spanning subgraph (V, s) of G unless otherwise stated. 

In this chapter, we use a symbol e to represent an undirected edge, though it was mainly 
used to represent a directed edge in previous chapters. By the notation of e = ij we mean 
that vertices i and j are the endpoints of e. The number of ends of edges connecting to a 
vertex i is called the degree of i and denoted by d{. 

The number of connected components of G is denoted by k(G). The nullity and the 
rank of G are defined by n(G) := \E\ — \V\ + k(G) and r(G) := |V| — k(G) respectively. 

For a graph G, the core of the graph G is given by a process of clipping vertices of degree 
one step by step [109] . This graph is denoted by core(G). A graph G is called a coregraph 
if G = core(G). In other words, a graph is a coregraph if and only if the degree of each 
vertex is not equal to one. 

For an edge e S E, the graph G\e is obtained by deleting e and G/e is obtained by 
contracting e. If e is a loop, G/e is the same as G\e. The disjoint union of graphs G\ and 
G2 is denoted by G\ U G2 ■ The graph with a single vertex and n loops is called the bouquet 
graph and denoted by B n . 

7.1.2 Graph polynomials 

Partition functions studied in statistical physics have been a source of many graph polyno- 
mials. For example, the partition functions of the q-state Potts model and the bivariated 
The term "loop" in " loopy belief propagation" and "loop series" has no relation to this definition of 

loop. 
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random-cluster model of Fortuin and Kasteleyn derive graph polynomials. They are known 
to be equivalent to the Tutte polynomial [T7]. Another example is the monomer-dimer par- 
tition function with uniform monomer and dimer weights, which is essentially the matching 
polynomial [55] . 

The most important feature of our graph polynomials is the deletion-contraction rela- 
tion: 

G (/3, 7 ) = (1-/3)MA7)+%(A7), 

U G {P) = W£Ae 03) + faG/eiP), 

where e G E is not a loop. Furthermore, these polynomials are multiplicative: 

#GiUG 2 = 9g 1 6g 2 an d WG!UG 2 = ^Gi^Ga- 

Graph invariants that satisfy the deletion-contraction relation and the multiplicative law 
are studied by Tutte |12Uj in the name of V-function. Our graph polynomials 9q and wg 
are essentially examples of V-function. 

Graph polynomials that satisfy deletion-contraction relations arise from wide range of 
problems [17} 134] . To the best of our knowledge, all of the graph polynomials that satisfy 
deletion-contraction relations and appear in some applications are known to be equivalent 
to the Tutte polynomial or obtained by its specialization. We can list the chromatic poly- 
nomial, the flow polynomial and the reliability polynomial for such examples. The Tutte 
polynomial have a reduction formula for loops, but our new graph polynomials do not have 
such reduction formulas for loops and are essentially different from the Tutte polynomial. 

7.1.3 Overview of this chapter 

This chapter discusses the following topics. First, in Section 17. 2\ we define the weighted 
graph characteristic 0g(/3, 7). An interesting property called deletion-contraction relation 
is shown when the vertex weights connected by the contracted edge are equal. In Section 
17.31 we derive upper and lower bounds on the number of sub-coregraphs, which are at- 
tained by 3-regular graphs and bouquet graphs respectively. Section 17.41 is a discussion on 
the ^-polynomial. We see that the ^-polynomial is essentially a new interesting example of a 
special class of graph polynomials called V-function. Section 1731 is devoted to investigations 
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of the w-polynomial including a study on the special value (3 = 1. We show that the poly- 
nomial coincides with the monomer-dimer partition function with weights parameterized 
by j3. Especially, it is essentially the matching polynomial if the graph is regular. 



7.2 Loop series as a weighted graph polynomial 
7.2.1 Definition 

In the first place, we introduce the expansion of the LS as a weighted graph polynomial. 
We associate complex numbers with vertices and edges 7 = (7i)iey and (3 = (/3 e ) eg £ 
respectively, making a graph G be a weighted graph. 

Recall that the set of polynomials {f n {x)} c ^ = Q is defined inductively by the relations 

fo(x) = 1, fi{x) = 0, and f n +i(x) = xf n (x) + f n ~i(x). (7.1) 

Therefore, f2(x) = 1, fz{x) = x and so on. Note that, these polynomials are transformations 
of the Chebyshev polynomials of the second kind: fn^^yf—lz) = (\/—l) n U n (z), where 

u n (co S e) = s ^^. 

Since we are considering graphs, the expression in Eq. (|6.2p reduces to the following 
form. 

Definition 13. Let (3 = (f3 e ) e£ E and 7 = (Ti)isV be the weights of G. We define 

0008,7) :=EIlAll4w(7i). (7-2) 

where di(s) is the degree of the vertex i in s. 

If all the vertex weights are equal to 7, it is denoted by 0q(/3,7). In addition, if all the 
edge weights are set to be the same, it is denoted by 0o(/3, 7). 

In Eq. (|7.2p . there is a summation over all subsets of E. Recall that an edge set s is 
identified with the spanning subgraph (V,s). Since fi(x) = 0, the subgraph s makes a 
contribution to the summation only if s does not have a vertex of degree one. Therefore, 
the summation is regarded as the summation over all coregraphs of the forms (V,s); we 
call them sub-coregraphs. In relevant papers, such subgraphs are called generalized loops 
[211 ES] or closed subgraphs [501151]. 
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It is trivial by definition that 

© Gi ug 2 (/3,7) = e Gl (/3, 7 )e G2 (/3, 7 ), (7.3) 
e Bo (/3, 7 ) = l, (7.4) 
6 G (/3, 7 ) = e core(G) (/3, 7 ). (7.5) 

These properties are reminiscence of the properties of the graph zeta function: Ch 1 uh 2 = 

CHiCh 2! C0 = 1 and Ch = Ccore(H)- 

7.2.2 Deletion-contraction relation 

Assuming a certain relation on 7 = (7i)iey> we prove the most important property of the 
graph polynomial called a deletion-contraction relation. The following formula of f n (x) 
is essential in the proof of the relation. 

Lemma 7.1. v ra, m G N, 

f n+m _ 2 (x) = fn{x)f m (x) + fn-l{x)f m -l{x). 

Proof. Easily proved by induction using Eq. (|7.1|) . □ 

Theorem 7.1 (Deletion-contraction relation). Let e = ij 6 E be not a loop. Assume that 
the weights (J3, 7) on G satisfies 7^ = 7,-. The weights on G\e and G/e are naturally induced 
and denoted by (/3 ,7') and (/3 ,7") respectively. (On G/e, the weight on the new vertex, 
which is the fusion of i and j, is set to be ji.) Under these conditions, we have 

e G (/3 )7 ) = (1 - /3 e )e G \ e ( / 3 / ,7') + /3 e G/e (/3",7")- 

Proof. Classify subgraph s in the sum of Eq. (|7.2p whether s includes e or not. A subgraph 
s 3 e = ij in the former case yields — /30 G \ e + /30 G / e , where Lemma [7TT1 is used with n = di 
and m = dj. A subgraph s ^ e in the latter case yields G \ e . □ 

Especially, G (/3, 7) satisfies this relation. By successive applications of the relations, 
G (/3,7) can be reduced to the values at disjoint unions of bouquet graphs. The deletion- 
contraction relation allows another expansion of G (/3, 7) as a sum over the subgraphs. 
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Theorem 7.2. 

e G (/3, 7 )=^n^( i ^) in(s) n^ n q--m> m 

sCEn=0 ees eGE-xS 

where i n (s) is the number of connected components of the subgraph s with nullity n. 

Proof. In this proof, the right hand side of Eq. (|7,6p is denoted by ©g(/3, 7). First, we check 
that &g and <8>g are equal at the bouquet graphs. 

©^(/3,7) = E^| S |( 1 ^)II^ n 

sCE eGs e£E\s 

-EECiV-wnA E IIHU 

sCEk=0 V 7 e£s tcE\.se£t 

-EEE( l iV W( - 1)H ""'nA 

wC-E sCu fc=0 ^ 7 een 

-EEEft l )(i)A»(7)(-D M - , nA- 

tiC£ Z=0 fc=0 V 7 V 7 e£u 

Using the equality Y^Ij=k \j) l) n+J = 5n,fc> which is obtained by comparing the coeffi- 
cients of (1 — (1 — x)) n = x n , we have 

e B „(/3, 7 ) = £ /2H(7) 11^ = bJ/3,7)- 
Secondly, we see that Bc(/3,7) satisfies the deletion-contraction relation 

e G (/3,7) = (1 -/3 e )e GV (/3',7) + /3 e e G/e ( / a /, , 7 ) 

for all non-loop edges e, because the subsets including e amount to /S e ©G/eG^>7) and the 
other subsets amount to (1 — /3 e )©G xe (/3, 7)- 

Applying this form of deletion-contraction relations to both G and @g, we can reduce 
the values at G to those of disjoint unions of the same bouquet graphs. Therefore we 
conclude that 0c = @g- D 
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Eq. (|7.6p resembles the famous random- cluster model of Fortuin and Kasteleyn [38J 
R G {l3,K) = Y,K k{s) Y[Pe J] 

which is a special case of the colored Tutte polynomial in [18]. This function satisfies a 
deletion-contraction relation of the form 

Rg(P, k) = (1 - p e )R G \ e (l3', k) + (3 e R G/e ((3", k) for all e G E. 

Note that this relation holds also for loops in contrast to Gg(/3,7). This difference comes 
form that of the coefficients of subgraphs s: K k ^ and Y\ 0b«(1j 7) ln( -' 5 ^. 



7.3 Number of sub-coregraphs 

An important property of the LS is that only the sub-coregraphs contribute to the sum, 
not all the subgraphs. Thus it is worth investigating how many sub-coregraphs in a graph 
among the subgraphs. This section discusses the number of sub-coregraphs, i.e., the number 
of terms in the LS. A simple analysis of the theta polynomial gives bounds on the numbers. 
First, we compute #(3(1,7). 

Lemma 7.2. For a connected graph G, 

o G (\,z- r 1 ) = e 1_n(G) (e + r x ) n(G)-1 + c (G)-1 (£ + r 1 )"^ -1 - (7.7) 

Note that the value 6 G {1^) is determined by the nullity n{G). 

Proof. For the proof, we use Lemma 16.21 which gives an alternative representation of Q G . 
In this graph case, Eq. (|6.7|) reduces to 

e G (Ate-Cr 1 )iev)= £ T[^ + xiXjMl x % Xi )Y[j^-i- (7.8) 

x 1 ,...,x N =±l e&E j 6 ySi + £j 

e=ij 

We set /3 e = 1 and £j = £. If Xj / Xj, then 1 + XiXj^~ x, ^~ x ^ = 0. Since G is connected, 
only the two terms of x\ = ■ ■ ■ = x^ = 1 and x± = ■ ■ ■ = xn = — 1 contribute to the sum. 
Then the equality is proved. □ 
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If £ = 3d^ ; then £ - f" 1 = 1. From Eq. (HZ]), we see that 

/- /fr\ n (0)-l /_ /-\n(G)-l 



Setting £ = 1, We also deduce from Eq. (|7,7p that 

G (l,Q) = 2 n ( G \ (7.10) 

7.3.1 Bounds on the number of sub-coregraphs 

For a given graph G, let C(G) := {s; s C E,(V,s) is a coregraph.} be the set of sub- 
coregraphs of G. In the following theorem, the values in Eqs. f|T.9j) and (|7.10p are used to 
bound the number of sub-coregraphs. The upper bound is first proved in [127] . 



Theorem 7.3. For a connected graph G, 

2^<\C(G)\<( 5 -^\ + 1^1 • (7.11) 



5- V5\ /5 + V5 



77ie lower bound is attained if and only if core(G) is a subdivision of a bouquet graph, and 
the upper bound is attained if and only i/core(G) is a subdivision of a 3-regular graph or G 
is a tree. 

Note that a subdivision of a graph G is a graph that is obtained by adding vertices of 
degree 2 on edges. 

Proof. It is enough to consider the case that G is a coregraph and does not have vertices of 
degree 2, because the operations of taking core and subdivision do not change the nullity 
and the set of sub-coregraphs essentially. 

From the definition Eq. (|7.13j) . we can write 

sec 

where w(s;^) = Y\ ie y fd t (s){l)- For all s E C, we claim that 

w(s;Q) < 1 < w(s;l). (7.12) 
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The left inequality of Eq. (|7.12p is immediate from the fact that / n (0) = 1 if n is even and 
/n(0) = if n is odd. The equality holds if and only if all vertices have even degree in s. 
Since / ra (l) > 1 for all n > 4 and /^(l) = /3(f) = 1, we have w(s; 1) > 1. The equality 
holds if and only if dj(s) < 3 for all i £ V. Therefore the inequalities in Eq. (|7.11|) are 
proved. The upper bound is attained if and only if G is a 3-regular graph or the Bo. For 
the equality condition of the lower bound, it is enough to prove the following claim. 

Claim. Let G be a connected graph, and assume that the degree of every vertex is at least 
3 and dj(s) is even for every i £ V and s G C. Then G is a bouquet graph. 

If G is not a bouquet graph, there is a non-loop edge e = iojo- Then E and E\e are sub- 
coregraphs of G. Thus <U (E) or d{ (E\ e) = di (E) — 1 is odd. This is a contradiction. □ 

7.3.2 Number of sub-coregraphs in 3-regular graphs 

If the core of a graph is a subdivision of a 3-regular graph, we obtain more information on 
the number of specific types of sub-coregraphs. 

We can rewrite Lemma 17.21 as follows. 

Lemma 7.3. Let G be connected and not a tree. Then we have 

n(G)-l 

Mi, 7)= E c n( G ),a 21 , 

1=0 

where C n j := ELi+1 (fc) ( fc+ i _1 ) for \ <l <n - \ and C n , := 2 n . 
Proof. First we note that for k > 1, 

/»(7) = E( 2/ ) T 
z=o v 7 



and 



W(7) = £( 2 Z + lJ 



„2/+l 
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This is easily proved inductively using Eq. (|7.1|) . Then Lemma 17.21 derives 



Ml, 7) = 02^(1,7) = E t ) hk{l) + /o(7) 




k=l 




n(G)-l 



= Yl C n{G),lJ 



1=0 



□ 



Theorem 7.4. Let G be a connected graph and not a tree. If every vertex of core(G) has 
the degree at most 3, then 

C n (G),i = |{ s £ s has exactly 21 vertices of degree S.}\ 

forO<l< n(G) - 1. 

Proof. From the assumption of this theorem, all degrees of a sub-coregraph s are at most 
three. Accordingly, for a sub-coregraph s, YiieV /d,(s)(7) = 7 2 ' holds, where 21 is the 
number of vertices of degree three in the subgraph s. Therefore, Lemma 17.31 implies the 
assertion. □ 

For example, consider the hypergraph in Figure 16.11 as a graph. Since this graph is a 
subdivision of a 3-regular graph, we can apply the Theorem 17.41 This graph has nullity 2 
and 



C2,i — 1, C2,o 



= 4. 



From Figure HOI one observes that Theorem 17.41 is correct for this case. 



7.4 Bivariate graph polynomial 9q 



In this section, we discuss Qq. For a given graph G, 



ma 7) - i> H II W?) G Z &M> 

sC-B ieV 



(7.13) 
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o oo 

Figure 7.1: Graph X\ and X 2 

where is the degree of the vertex i in s. 
7.4.1 Basic properties 

The following facts are immediate from the previous results. 
Proposition 7.1. 

(a) % lUG2 (/3, 7 ) = ^ 1 (/3,7)^G 2 (/3,7)- 

(b) e Bn {^i) = Y. n k=0 {t)f2k(i)P k . 

(c) <M&7) = 0care(G)O9,7). 

(d) 9 G (P,j) = (1 - P)0G\e(P,l) +/36 l G/e(/5,7) /or a non /oop ed#e e. 

(e) MA 7) = E s cEn„ = o^( 1 '7) i " (s) /3 N (l-/3) |£;| - N . 

Example 7. For a tree T, Ot(P, 7) = 1- For the cycle graph C n , which has n vertices and n 
edges, Cn (P,i) = l + /3 n . For the complete graph K 4 , 9 Ki (P,j) = 1 + 4/3 3 + 3/3 4 + 6/3 5 7 2 + 
/3 6 7 . For the graph X\, which is in Figure [TTTT dxiiPil) = 1 + 3/3 2 + /3 3 7 2 . For the graph 
X 2 , which is also in Figure EU 6 Xa (p, 7) = 1 + 2/3 + /3 2 + /3 3 7 2 . 



7.4.2 0q as a Tutte's V-function 

In 1947 [120J, Tutte defined a class of graph invariants called V-function. The definition is 
as follows. 

Definition 14. Let Q be the set of isomorphism classes of finite undirected graphs, with 
loops and multiple edges allowed. Let R be a commutative ring. A map V : Q — > R is called 
a V-function if it satisfies the following two conditions: 



(i) v(G) = V(G\e) + V(G/e) if e G E is not a loop, 

(ii) V(G 1 UG 2 ) = V(Gi)V(G 2 ). 
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Our graph invariant 9 is essentially an example of a V-function. In the definition of 
V- functions, the coefficients of the deletion-contraction relation are 1, while those of 9 are 
(1 — pi) and p. If we modify 9 to 

6 G {p,i) := (l-/r IWI /3- m M/3,7), 

this is a V-function 9 : Q -> Z[/3, 7 ,/3-\ (1 - /3)" 1 ]. 

By successive applications of the conditions of a V-function, we can reduce the value 
at any graph to the values at bouquet graphs. Therefore we can say that a V-function is 
completely determined by its boundary condition, i.e., the values at the bouquet graphs. 
Conversely, Tutte shows in [120J that for an arbitrary boundary condition, there is a V- 
function that satisfies it. More explicitly, the V-function satisfying a boundary condition 
{V(B n )} n=0 is given by 

v(g)=eiiw ( 7 - u ) 

sCEn=0 

where z n := Sj=o Kj) (~ -*-) n+J ^(-^j) ano ^ ^ n (s) i s the number of connected components of 
the subgraph s with nullity n. In our case of 9, the expansion Eq. (|7.14p is equivalent to 
the (e) of Proposition 17.11 

The Formulas (|7.13p and (e) of Proposition 17.11 are both represented in the sum of the 
subsets of edges, but the terms of a subset are different. Generally, a V-function does not 
have a representation corresponding to Eq. (|7.13|) : this representation makes 9 worthy to 
be investigated among V-functions. 



7.4.3 Comparison with Tutte polynomial 

The most famous example of a V-function is the Tutte polynomial multiplied with a trivial 
factor. Many graph polynomials, which appear in computer science, engineering, statistical 
physics, etc., are equivalent to the Tutte polynomial or its specialization [33]. The Tutte 
polynomial is defined by 



T G (x,y) := £( x -l)KG)-rW (l/ _ . 



(7.15) 
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It satisfies a deletion-contraction relation 



Tc(x,y) 



xT G \ e (x,y) if e is a bridge, 

y T G\e( x ,y) if e is a loop, 

T G \ e (x, y) + T G/e (x, y) otherwise. 



It is easy to see that Tq(x, y) := (x — 1)^ g >Tq(x, y) is a V-function to y]. For bouquet 
graphs, T Bn (x, y) = (x — l)y n . In the case of Tutte polynomial, Eq. (|7.14p derives Eq. (|7,15p . 

The V- functions 9 and T are essentially different. The assertion in the following remark 
implies the difference irrespective of transforms between (/3,7) and (x,y). 

Remark. For any field K, inclusions 4>\ : Z[/3,7, /3 , (1 — K, and (p2 ■ Z[x,y] ^->- 

K , we have 

<pl o § ^ (p2 o T. 

Proof. It is easy to see that (foC^sJ / '<h{ T Bo) = <My) n and 4>\{9 Bn ) / 4>\(9 Bo ) = </>i(l ~ 
P)~ n MEk=o (fc)/2fc(7)/3 fc ). If <I>io9 = foot, then a n := ££ = o ^)hh{i)P' h = z n for some 
z £ K, where 7' = ^1(7) and /?' = 4>i(/3). The equation a\ = 02 gives 7 ,2 /3 /2 = 0. This is a 
contradiction because j3 7^ and 7 7^ 0. □ 

As suggested in the proof of the above remark, if we set 7 = 0, the polynomial 0) 
is included in the Tutte polynomial. 

Proposition 7.2. 

M/3,0) = (l-/3r^ G )T G (i,i±|). 
Proof. From Proposition 17.11 (b) and /2fc(0) = 1, we have 

9 Bn (P,0) = (1 - /3) 1 -"/?- 1 £ ("V = (1 - P) l ~ n r\l + PT- 

k=o ^ ' 

We also have r Bn (J,£§) = (/T 1 - Therefore § Bn (P,0) = r Bn (£,g§). Since 

V-functions are determined by the values at the bouquet graphs, 9q(/3,Q) = Tq(^,jz§) 
holds for any graph G. □ 



This result is natural in the view of the Ising partition function Eq. fjl ,5[) with uniform 
coupling constant J and no external fields (hi = 0). Here, for simplicity, we call it the simple 
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Ising partition function. The Tutte polynomial is equivalent to the partition function of 
the q- Potts model [T7], where q is the number of allowed states at each vertex. If we set 
q = 2, it becomes the simple Ising partition function. In terms of the Tutte polynomial, it 
correspond to the parameters (x,y) = (4, Therefore, ?g(4, is the simple Ising 

partition function in essence. On the other hand, at a point of 7 = 0, 0g(/3,0) is also the 
simple Ising partition function essentially, because the representation of 0g(/3, 0) in the sum 
of sub-coregraphs coincides with the expansion of van der Waerden [131J. 

We can say that the Tutte polynomial is an extension of the simple Ising partition 
function to the g-state model while the polynomial 9 is an extension of it to a model with 
specific form of local external fields. 

7.5 Univariate graph polynomial luq 
7.5.1 Definition and basic properties 

In this section we define the second graph polynomial u) by setting 7 = 2y/—l. It is easy to 
check that / n (2 v /3 T) = (y/^T) n (l - n), using Eq. £□]). Therefore 

= ^(-/3)l fi l H(l - *(«)). (7.16) 

sCE i£V 

An interesting point of this specialization is the relation to the monomer-dimer partition 
function with specific form of monomer-dimer weights, as described in Section 17.5.21 Fur- 
thermore the product of (1 — di(s)) resembles the loop series of the perfect matching problem 
given in Theorem 16.41 

From Eq. (|7.7|) . OgO-, 2\/— 1) = unless all the nullities of connected components of 
G are less than 2. The following theorem asserts that 0g(/3, 2\/— 1) can be divided by 
(1 — py E \~\ v \. We define log by dividing that factor. 

Theorem 7.5. 

In Eq. (|7.16p . 0g(/3, 2y— 1) is given by the summation over all sub-coregraphs and 
each term is not necessarily divisible by (1 — /J)'^ - '^'. If we use the representation in (e) 
of Proposition 17.11 however, each summand is divisible by the factor as we show in the 
following theorem. Theorem 17.51 is a trivial consequence of Theorem 17.61 
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Theorem 7.6. 

scE n=0 

where ho(fi) := (1 - j3), h x (J3) := 2 and /i n (/3) := for n > 2. 

Proof. Proposition 17.11 (b) and / m (2\/— 1) = (v— l) m (l — m), we have 

1 if n = 

2 if n = 1 
if n > 2. 

Proposition 17.11 (e) gives 

sCEn=0 
sCEn=0 

Then the assertion is proved. □ 
Example 8. 

For a tree T, ojt{P) = 1-/3. For the cycle graph C n , w Gn (/3) = 1 + ft n - For the complete 
graph K A , uj Ka (P) = l+2/3+3/3 2 +8/3 3 + 16/3 4 . For graphs in FigureEH w Xl (P) = 1+/3+4/3 2 
and u>x 2 (/3) = 1 + 3/3 + 4/3 2 . 

We list basic properties of u below. 

Proposition 7.3. 

(a) wgiug 2 (/3) = WGi(|8)wG 8 ((8). 

(b) w G (/3) = w G \e(/3) + PuG/e{0) ifeeEis not a loop. 

(c) u; Bn (/3) = l + (2n-l)/3. 

(d) W G (/3) = W corc ( G )(/3). 

(e) ujg(P) is a polynomial of degree \V com ^\. The leading coefficient is Y\ i€V (G)^~^ 
and the constant term is 1. 



^ n (i,2v^i)=x;Q(-i) fc (i-2fc) 
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(f) Let G( m ) be the graph obtained by subdividing each edge to m edges. Then, 

u G(m) 03) = (i + p + ■ ■ ■ + r-^-^ciD- 

Proof. The assertions (a-e) are easy, (f) is proved by \Eq\ — \Vq\ = \E G ( m )\ — \ V G { m ) \ and 
9 G(m) (/3, 2^=1) = 6 G ((3 m , 27=1). □ 

Proposition 7.4. 7/G does noi /iaue connected components of nullity 0, then the coefficients 
of u G {f3) are non-negative. 

Proof. We prove the assertion by induction on the number of edges. Assume that every 
connected component is not a tree. If G has only one edge, then G = B\ and the coefficients 
are non-negative. Let G have M(> 2) edges and assume that the assertion holds for the 
graphs with at most M — 1 edges. It is enough to consider the case that G is a connected 
coregraph because of Proposition 17.31 (a) and (d). If all the edges of G are loops, G = B n 
for some n > 2 and the coefficients are non- negative. If G = Cm, the coefficients are also 
non negative as in Example Otherwise, we reduce oj g to graphs with nullity not less than 
1 by an application of the deletion-contraction relation and see that the coefficients of u G \ e 
and cu G / e are both non- negative by the induction hypothesis. □ 

7.5.2 Relation to monomer-dimer partition function 

In the next theorem, we prove that the polynomial uj G {f3) is the monomer-dimer partition 
function with specific form of weights. 

As defined in Section \QA\ a matching of G is a set of edges such that any edges do not 
occupy a same vertex. It is also called a dimer arrangement in statistical physics [53]. We 
use both terminologies. The number of edges in a matching D is denoted by |D|. If a 
matching D consists of k edges, then it is called a k-matching. The vertices covered by the 
edges in D are denoted by [D]. The set of all matchings of G are denoted by V. 

The monomer-dimer partition function with edge weights fi = (/i e ) eg £ and vertex 
weights A = (Aj)j g y is defined by 

Z G (fi, \) := Tl t le II Ai ' 

De£>e6D ieV\[D] 

We write Ho(/i, A) if all weights \i e are set to be the same //. 
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Theorem 7.7. Let \ := 1 + {<k - then 

u G ((3) = E G (-(3,X). 

Proof. We show that /3, A) satisfies the deletion-contraction relation and the boundary 
condition of the form in Proposition 17.31 (c). For the bouquet graph B n , D = (f> is the only 
possible dimer arrangement, and thus 

H Bn (-/3,A) = l + (2n-l)/3 = Wi3n (/3). 

For a non-loop edge e = iojo, we show that the deletion-contraction relation is satisfied. A 
dimer arrangement D 6 T> is classified into the following five types: (a) D includes e, (b) 
D does not include e and D covers both iq and jo, (c) D covers io while does not cover jo, 
(d) D covers jo while does not cover io, (e) D covers neither io nor jo. According to this 
classification, /3, A) is a sum of the five terms A, B, C, D and E. We see that 

c= e h 3 ) 101 n a * 

dsd ieV\[D] 

[D]3i ,[D]^0 

= e {-p)\»\(i+{d 30 -2)p) n Aj 

DEB iGV\[D] 
[D]9i ,[D]^ 

+/? e ho |di n ^ 



DGP ieV\[T>] 
[D]3i ,[D]3j i^j 



=: Ci + /3C 2 . 
In the same way, D = D\ + fiDi . Similarly, 



e= e n a * 

D€B igV\[D] 
[D]^ ,[D]?j «7«0.io 

E (-/?) |D| (l + Ko-2)/3)(l + (d io -2)/3) J] Aj 

DGE> »eV\[D] 
[D]?i ,[D]3j i^io-JO 

+ /3 E (-/3) |D| (2 + (d io +d JO -3)/3) [] Ai 

Dec 

[D]?i ,[D]^' 

=: Ex + f3E 2 . 



Dec iev\[D] 
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We can straightforwardly check that 

E G \ e (-(3,X') = B + C 1 +D 1 +E 1 

and 

PE G/e (-p, A") = A + (3C 2 + f3D 2 + fiEz, (7.17) 

where A' and A" are defined by the degrees of G\e and G/e respectively. Note that C 2 + D 2 
in Eq. (|7.17p corresponds to the dimer arrangements in G/e that cover the new vertex 
formed by the contraction. This shows the deletion-contraction relation. □ 

Let p G {k) be the number of k-matchings of G. The matching polynomial «g is defined 

by 

<*g(x) = {-l) k PG(k)x\ y \- 2k . 

k=0 

The matching polynomial is essentially the monomer-dimer partition function with uniform 
weights; if we set all vertex weights A and all edge weights \x respectively, we have 

SgOu, A) = a G (^/=) V-J^ Vl ■ 

Therefore, for a (q + l)-regular graph G, Theorem 17.71 implies 

u G (u 2 ) = a G (- + quju lvl . (7.18) 

In [90j , Nagle derives a sub-coregraph expansion of the monomer-dimer partition function 
with uniform weights, or matching polynomials, on regular graphs. With a transform of 
variables, his expansion theorem is essentially equivalent to Eq. (|7.18p . We can say that 
Theorem 17.71 gives an extension of the expansion to non-regular graphs. 

As an immediate consequence of Eq. (|7.18p . we remark on the symmetry of the coeffi- 
cients of oj g for regular graphs. 

Corollary 7.1. Let G be a (q + 1) — regular graph (q > 1) with N vertices and Wk be the 
k-th coefficient ofu G (/3). Then we have 



w N - k = forO<k<N. 
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7.5.3 Zeros of u G (p) 

Physicists are interested in the complex zeros of partition functions, because it restricts 
the occurrence of phase transitions, i.e., discontinuity of physical quantities with respect to 
parameters such as temperature. In the limit of infinite size of graphs, analyticity of the 
scaled log partition function on a complex domain is guaranteed if there are no zeros in 
the domain and some additional conditions hold. (See |1344 1108] .) For the monomer-dimer 
partition function, Heilman and Lieb [53J show the following result. 

Theorem 7.8 ([53] Theorem 4.6.). If (i e > for all e G E and Re(Aj) > for all j G V 
then Sc(/x, A) ^ 0. The same statement is true i/Re(A,) < for all j G V. 

Since our polynomial ujg(P) is a monomer-dimer partition function, we obtain a bound 
of the region of complex zeros. 

Corollary 7.2. Let G be a graph and let d m and &m be the minimum and maximum degree 
in core(G) respectively and assume that d m > 2. If (3 G C satisfies ojg{P) = 0, then 



du — 1 d m — 1 

Proof. Without loss of generality, we assume that G is a coregraph. Let (5 = \ j3\e l6 satisfy 
vg(P) = 0, where < 8 < 2ir and i is the imaginary unit. Since cjg(0) = 1 and the 
coefficients of uig(P) is not negative from Proposition 17. 4| we have (3^0 and 9 ^ 0. We 
see that 

where Aj = 1 + (dj - l)/3, and Re(ie" ie / 2 A j ) = (1 - (dj - l)\/3\) sin f . From Theorem E3 
the assertion follows. □ 

Especially, if the graph is a (q + l)-regular graph, the roots are on the circle of radius 
1/q, which is also directly seen by Eq. (|7.18|) combining the famous result on the roots 
of matching polynomials [53J: the zeros of matching polynomials are on the real interval 
(-2^,2^). 

7.5.4 Determinant sum formula 

Let T := {C C E;di(C) = or 2 for all i G V} be the set of unions of vertex-disjoint 
cycles. In this subsection, an element C G T is identified with the subgraph (Vc,C), where 
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Vc '■= {i £ V',di(C) 7^ 0}. A graph G \ C is given by deleting all the vertices in Vq and 
the edges of G that are incident with them. 

The aim of this subsection is Theorem 17.91 in which we represent ujq as a sum of 
determinants. This theorem is similar to the expansion of the matching polynomial by 
characteristic polynomials 



a G (x) = 2KC) det I^ " A G^d (7.19) 
CeT 

where Aq^c is the adjacency matrix of G \ C and k(C) is the number of connected com- 
ponents of C. 

Theorem 7.9. 

co G (u 2 )= ^2 fc ( c )det([/-^ G + U 2 ( J D G -/)] )v) c \, (7.20) 
CeT GxC/ 

where Dq is the degree matrix defined by {Do)ij := d{5ij and '\ G G denotes the restriction 
to the principal minor indexed by the vertices of G \ C. 



Proof. For the proof, we use the result of Chernyak and Chertkov [23]. For given weights 
fi = (fi e ) e £E and A = (Aj)j g y, a \V\ x \V\ matrix H is defined by 



H := diag(A) - ^ ^J-\x e A e 



where A e = Eij + Ej{ for e = ij and Eij is the matrix base. In our notation, their result 
implies 

E G (tx, \)=j2 2fc(c) det h loxc n 



CeT eec 



If we set Xi = 1 + (di — l)u and yj—^e = u, then the assertion follows. □ 

For regular graphs, Eqs. (|7. 19|) and (|7.20p are equivalent because of Eq. (|7,18p . It is 
noteworthy that the matrix (/ — uAq + u 2 (Dc — I)) is nothing but the matrix that appear 
in the Ihara-Bass formula of the Ihara zeta function. It is also noteworthy that the region 
of zeros in Corollary 17.21 resembles the region of poles of Ihara zeta function derived from 
Eq. (1331) . 



132 CHAPTER 7. GRAPH POLYNOMIALS FROM LS 




X., 



Figure 7.2: Graph X3 and possible arrangements on . 
7.5.5 Values at /3 = 1 

The value of is interpreted as the number of a set constructed from G. For the 

following theorem, recall that G^ is obtained by adding a vertex on each edge in G = (V, E). 
The vertices of G^ := (V^ 2 \ E^) are classified into Vo and Va, where Vo is the original 
vertices and Va is the ones newly added. The set of matchings on G^ is denoted by T> G (2) . 

Theorem 7.10. 

u G (l) = \{DeV Gm ;[T>}DV }\. 
Proof. From Theorem 17. 61 we have 

wg(1)= E 2fc(S) > ( ? - 21 ) 

sCB, s =GiU-UGf fcW 
n(Gj) = l for 3=1. ..k(s) 

where Gj is a connected component of (V, s). We construct a map F from {D G 2?g(2) ; [D] 3 
Vq} to s C £ as 

F(D) := {e € J5; the half of e is covered by an edge in D}. 

Then the nullity of each connected component of F(D) is 1 and = 2 k ( s \ □ 

Example 9. For the graph X3 in Figure I7T2| u>x 3 (l) = ^C 3 (l) = 2. The corresponding 
arrangements are also shown in Figure 1731 

In the end, we remark on the relations between the results on oJciX) obtained in this 
paper. From Proposition 17.31 w<?(l) satisfies 



wg(1) = <^G\e(l) + w G/e(l) if e G -E is not a loop. 
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This relation can be directly observed from the interpretation of Theorem 17.101 Theorem 
17.71 gives 

- G (i)=E(- i ) |D| 

Dev tev\[D] 

which can be proved from Theorem 17.101 with the inclusion-exclusion principle. Theorem 
17.91 gives 

u G (l)=Y,^ (C) det[D G -A G } 
We can directly prove this formula from Eq. (|7,2ip using a kind of matrix-tree theorem. 

7.6 Discussion 

In this chapter, we analyzed the LS ignoring the relations between the weights (3 and 7. 
In other words, we treated the LS as a weighted graph polynomial Gg(/3, 7). Under the 
treatment, we derived strict bounds on the number of sub-coregraphs, which scales with the 
nullity of graphs. We also showed that Q G satisfies deletion-contraction relation, assuming 
the vertex weights on the ends of the contracted edge are the same. Though the result does 
not have direct implication for the properties of the Bethe approximation, it demonstrates 
rich mathematical structures of the LS. 

Specializing 0(j(/3,7), we introduced two graph polynomials and elucidated their prop- 
erties. These are new instances of Tutte's V-function, allowing alternative sum expression 
with respect to all the subgraphs. For the univariate graph polynomial ujq, we found inter- 
esting property such as the relation to the monomer-dimer partition function and a little 
connection to the Ihara zeta function. 



Chapter 8 



Conclusion 



8.1 Conclusion 

In this thesis, we analyzed mathematical properties of loopy belief propagation algorithm in 
emphasis of the graph geometry. The exact inference on a graph requires "global computa- 
tion," which is computationally intractable, whereas the approximate inference by the LBP 
algorithm only requires "local computation." The global/local discrepancy is the origin of 
the approximation errors of the LBP algorithm. The gap between the global and the local 
disappears if the graph geometry is trivial, i.e., tree. 

This concept is not restricted to the LBP algorithm. In fact, we often encounter "global" 
computational problem which is approximated by "local" computations such as message 
passing algorithms on graphs. Obviously, the max-product algorithm, which gives exact 
result of maximization problems associated with trees, has the same difficulty. 

In Part I, we introduced the graph zeta function and showed the Bethe-zeta formula. 
Since the LBP fixed points are characterized in terms of the Bethe free energy function, the 
graph geometry should be reflected in the function. The Bethe-zeta formula claims that 
the graph zeta function is the key quantity that reflects the graph geometry in the context 
of LBP algorithm. 

The novel relationship between LBP, or the Bethe free energy function, and the graph 
zeta function provides new techniques for the analysis of the properties of LBP and the 
Bethe free energy function. We demonstrated applications of the techniques in this thesis. 
For example, we showed that the region where the Hessian of the Bethe free energy function 
is related to the nearest pole of the Ihara zeta function. We also showed that locally stable 
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fixed points of LBP are local minima of the Bethe free energy function. For a certain class 
of models on graphs with nullity two, the uniqueness of the LBP fixed point is proved by 
checking positives of the graph zeta function. 

Since the relationship between LBP and the Bethe free energy is clarified by Yedidia et al 
[135J, many variants of the LBP algorithm have been proposed based on the understanding. 
We believe that our new relation to the graph zeta function also opens the door to the 
future developments or improvements of LBP algorithm. 

In Part II, we investigated into the loop series. Since the loop series is the sum with 
respect to sub-coregraphs, the form of the loop series expansion reflects the graph geometry. 
In fact, it is equal to 1 if the underlying graph is a tree. Our analysis was basically focused 
on the expression itself, leaving the relations between the weights (3 and 7. 

We analyzed mathematical properties of Qq an d showed interesting properties such as 
deletion-contraction relation. We also showed partial connection between the loop series 
and graph zeta function. However, many problems are left regarding the connection. 

8.2 Suggestions for future researches 

This section suggests possible extensions and developments of our analysis. 

8.2.1 Variants and extensions of the Bethe-zeta formula 

As mentioned in Section ^. 3. 41 there are many variants and extensions of the LBP algorithm. 
Accordingly, it is natural to think of variants and extensions of Bethe-zeta formula. 

Fractional belief propagation: This extension is possible, using the Bartholdi type graph 
zeta function. This extension will be discussed in a future paper. 

Generalized belief propagation: Another possible direction of the extension of the 
formula is Generalized Belief Propagation (GBP). We have not considered this extension. 
The zeta function appear in this extension may be interesting from combinatorics view 
point. And may prove or disprove the statement: "locally stable fixed points of GBP are 
local minima of the Kikuchi free energy." 
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Expectation propagation: We can also think of the extension to expectation propa- 
gation. In the method, local exponential families are glued together by local consistency 
condition of expectations of sufficient statistics. In the proof of Bethe-zeta formula, key 
property was Var& a [</>j] = Var^ [</>], which is not guaranteed by the consistency of expecta- 
tions. Therefore, the extension of the Bethe free energy is nothing apparent. 

8.2.2 Dynamics and convergence of LBP algorithm 

In chapter [5l we developed a new approach to show the uniqueness of the LBP fixed point. 
An interesting question is how we can extend our approach to show the convergence of the 
LBP algorithm. By definition, the convergence property is stronger than the uniqueness 
property. However, in binary pairwise case, the uniqueness condition in Corollary 15.11 also 
guarantees the convergence [57]. 

One vague suggestion for approaching to the dynamics and the convergence of LBP is 
considering graph covers. A cover G of a graph G is a graph having a map tt to G that 
is a surjection and a local isomorphism. The Ihara zeta function has rich connection with 
graph covers [110} 1111] . For example, It is well known that Cg^) -1 divides [84] . 

The uniqueness and convergence of LBP is also related to graph covers. Obviously, the 
uniqueness on G that has induced compatibility function from G guarantees the uniqueness 
on G. The uniqueness of the Gibbs measure on the universal covering tree, i.e. the infinite 
depth computation tree, guarantees the convergence of the LBP algorithm on G [119] . These 
fragmented facts suggests further developments of theories on graph zeta functions, graph 
covers and the dynamics of the LBP algorithm. 

Finally, it is noteworthy that the Ihara zeta function has an interpretation as a dynamical 
zeta function. In general, dynamical zeta functions encodes information of periodic points 
of the given dynamical systems [7j . It is known that the Ihara zeta function is the dynamical 
zeta function of a certain symbolic dynamical system derived by the graph [76J. It would be 
interesting to pursue the relation between this dynamical system and the LBP algorithm. 

8.2.3 Other researches related to LBP and graph zeta function 

Some recent researches have suggested the importance of zeta function. In the context of 
the LDPC codes, which is an important application of LBP, Koetter et al have shown the 
connection between pseudo-codewords and the edge zeta function [74\ 175]. Though there 
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appears graph zeta function, our result and their result are basically different. In the field of 
codes, parity check constraints are considered. Thus the compatibility functions have values 
of zero. In contrast, we considered arbitrary positive compatibility functions in applications 
of the Bethe-zeta formula. Compatibility functions with zero values are related to faces of 
the closure of L, and limits of the Bethe free energy function to faces are nothing obvious. 

For the Gaussian belief propagation, Johnson et al |69j give zeta-like product formula 
of the partition function. 

An implicit reason for the appearance of zeta function is the local nature of message 
passing algorithms. Local operation does not distinguish covering graphs and the original 
graph in some sense. Graph zeta functions are intimate relation to graph covers. Though 
their works are not directly related to our work, from such viewpoints, pursuing connections 
is an interesting future research topic. 
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Useful theorems 



A.l Linear algebraic formulas 

Basic notation is as follows. The set of n\ x ri2 matrices is denoted by M(rei,n2). For a 
square matrix X, the set of eigenvalue is denoted by Spec(X) and the spectral radius, i.e. 
the maximum modulus of the eigenvalues, is denoted by p(X). 

Theorem A.l ([57 J). Let X = (xij) and Y = [y-ij) be non-negative matrices satisfying 
Xij < yij = 1, • • • ,n), then p(X) < p(Y). 

Theorem A. 2 ([57 ). Let X = (xij) 6 M(n, n) be a non-negative matrix, then 

n n 

mm y^Xij < p(X) < max Var# (A.l) 

1=1 i=l 

and 



n n 



in > Xij < p{X) < max > Zj,-. (A. 2) 



nun 

KKn ' J ' KKn 
" " 3=1 " " j=l 

Definition 15. Let X = (xij) € M(n, n) be a non-negative matrix and let Gx be a directed 
graph consists of vertices V = {1, . . . , n} and directed edges j — )■ i for x^j ^ 0. The matrix 
X is irreducible if Gx is strongly connected, i.e., for each directed pair £ V x V, there 
is a directed walk from i to j. 

Theorem A. 3 (Perron Frobenius theorem |57j). Let X be a non-negative matrix of size n. 
Then p(X) is an eigenvalue of X having non-negative eigenvector. The eigenvalue is called 
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the Perron- Frobenius eigenvalue of X. Furthermore, if X is irreducible, p(X) is positive, 
simple and having the positive eigenvector. 

Proposition A.l (Schur complement). Let X E M(n, n). Let Y be its inverse. The blocks 
of sizes n\ and n 2 (n = n\ + are denoted by 



X = 


~x n 


X\2 


Y = 


v u 






X21 


X22 




Y21 


Y22 



Then 



X^ = Y u - Y 12 Y 2 ^Y 21 



detX = det XndetY^ 1 



Proof. It is trivial that 







"2 



Y 



Y n ~ Y 12 Y 22 l Y 21 ' 
Y21 Y 22 



(A.3) 
(A.4) 



Multiplying X form right, we obtain Eq. (|A.3p . Eq. ()A.4p is derived by taking the deter- 
minant of the above identity. □ 



Proposition A. 2. For A G M(n,m) and B E M(m, n), 

det (I n — AB) = det(J m - BA) 
Proof. Take the determinant of the following identity: 



I n - AB 
B I n 





'in 


-A 




'in -A 













.0 I m _ 





In 

B I m -BA 



(A.5) 



(A.6) 



□ 



Proposition A.3. Let X be a d x d matrix of the form 



X 



a b ■ ■ ■ U 
b a ■ ■ ■ b 

b b • • ■ a 



(A.7) 
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Then detX = (a - b) d - l (a + (d- 1)6) and 



x- 1 



(a - b)(a + (d - l)b) 



a +(<2-2)6 -6 

-6 a+{d-2)b 

-b b 



-b 
-b 

(d - 2)6 



(A.8) 



A. 2 On probability distributions 

Proposition A. 4. Let x, y be vector valued random variables following a probability distri- 
bution p. If Var p [(x, y)] is regular, then 



|Cor p [y,x]|| 2 < 1, 



(A.9) 



where Cor p [y,x] is the correlation coefficient matrix and \\ ■ || 2 is the norm induced by the 
inner product. (Definitions are found in Subsection \4 .3.2\ ) 

Proof. Prom the assumption of this proposition, Var p [y] and Var p [x] are both regular and 
the correlation coefficient matrix given by Eq. (|4.16p is well defined. For arbitrary vector a 
and 6, 

(6, Cov p [y,x]a) = E p [6 T (y -E[y])(x- E[x]fa] 



< v /(&,Vax[y]b) v /<a,Var[a;]a), 

where we used Schwartz's inequality. This inequality must be strict for all a and 6 because 
of the regularity of Var[(a;,y)]. Using the above inequality, we obtain 



|Cor p [y,x]|| 2 



max 



, Cov p [y,x]a) 



b£0,a?0 v / (6,Var[y]6)y / (a,Var [x]a) 



< 1. 



□ 
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B.l Inference on tree 

The following formula gives the covariance of the sufficient statistics on separated vertices on 
a tree. The expression involves covariances of neighboring vertices and inverted variances. 
It is interesting that this type of expressions also appears in the linearization of the LBP 
update in Theorem 12.21 and the Bethe-zeta formula. 

Proposition B.l. LetT = {£ a ,£i} be an inference family on a tree structured factor graph 
H = (V,F). Let p be the probability distribution obtained by an graphical model satisfying 
Assumption^ (See Subsection \2.2.3\ .) For vertices i, j of H , let (i = ii,a%, 12,0:2, ■■■ ,a n ,j = 
i n +i) be the unique walk from i to j that satisfies cti 7^ c^+i- We have 

Covplfa, (pjj = Cov Pan [(t> in+1 , c^jVar^ • • • 

Cov Pa2 [(f> i3 , 4>i 2 ]Vax Pi2 [<Ai 2 ]~ 1 Cov Pcii [&a, 

where pi l and p ai are marginal distributions of p, and (j)^ are sufficient statistics of expo- 
nential families 

Proof. Consider a tree H = (V, F) given by V = {1, 2, 3} and F = {a = {1, 2}, (3 = {2, 3}}. 
We compute the covariance of <f>\ and ^3 on this graph. Other cases are reduced to this 
case. Thus what we have to show is 

Cov[0i,0 3 ] = Cov^i^^Var^j^Cov^,^]- (B.l) 
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Let us define a subfamily of the global exponential family that include the given graphical 
model: 



p(xi,x 2 ,x 3 ;0 1 ,9 2 ,0 3 ) = exp{0i(f>i(xi) + 0<i2>0i2(»i, x 2 ) + 6*202(2:2) 

+ 0(23)h3(x 2 ,X 3 ) + 030 3 (x 3 ) - ^{01,02,03) 

From the assumption, the given distribution p is equal to p(0) for some 9 = (0i, 02,03)- 
The expectation parameters are denoted by (771, 7/2, 773). The variances and covariances are 
computed by the derivatives of the log partition function: 

d 2 ip 

Cm ^^ ] = d0-d0-- 
Claim. Let ip be the Legendre transform of ip, then 



0. (B.2) 



Proof of claim. Local exponential families £12, £2 arid £23 are denoted by 

b 12 (x 1 ,X 2 ; 6*12:1, 6*12:2) = exp (012:101 (a?l) + 6*12:202(^2) + 012012 {x\ , X 2 ) - "012(012:1, 012:2)) , 

b 2 (x 2 ; 0' 2 ) = exp {0' 2 (t) 2 {x 2 ) - ip 2 {0' 2 )) , 

b 23 (x 2 , X 3 ; 023:2, 023:3) = exp (023:2 02(^2) + 023:3 03 ^3) + 023023 (x 2 , X 3 ) - "023(023:2, 023:3)) • 

The dual parameter sets are denoted by (7/12:1,^12:2), fl' 2 and (7723:2, ^23:3) • As usual, we are 
assuming that the inference family satisfies Assumptions [T] and [2j (See Subsections 12.2.11 
and[2JL2j) 

If we set 

7/12:1 = VI, V 2 = V12-.2 = V23.2 = ??2 and 7/23:3 = ?73, 

we see that 612 = P12, b 2 = p 2 and 623 = P23- On the other hand, we have 

n (n- <r <r\- Pl2 (^1 , ^ 2 )P23 (^2 , X 3 ) . , 

P{Xl,X 2 ,X3) — r 

P2{X 2 ) 



B.2. THE HESSIAN OF T 
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because H is a tree. From Eqs. (|B.3p and (|2.9p . we drive 

<p(nx,m,m) = ^12(771,772) + ^23(772,773) - ^(m) 

From this equation, the assertion of this claim is immediately proved. □ 

Let us go back to the proof of Proposition IB. 1L Using standard results presented in 
Subsection 12.2.11 it is easy to see that 

A d 2 ip d 2 ip 

dOid0j dr]jdr] k l ' k ' 

Setting (i,k) = (1,3), (2,3), we obtain 

Cov [fa , fa] — |- + COV [fa , fa] — |- = 0, 

9?72^3 di] 3 dr] 3 

Var [fc] + Cov [fa , 3 ] = 0. 

dr] 2 dri3 or] 3 dr] 3 

We obtain Eq. (jB.ip from these equations. □ 



B.2 The Hessian of T 

This section derives Eq. (|4.2p calculating the second derivatives of T . Recall that the 
domain of the type 2 Bethe free energy function T is 

A(l, *) := {6 = {9 a , 0;}|0 (a) = 9 {a) , ^ a -.i = (1 - d^ft + ^ ^ V « S i 7 , v i € a}. 

We take {# Q: i}»e-F,ie» as a set of free parameters and thus the type 2 Bethe free energy 
function Eq. (jXiOl) is 

We introduce another coordinate {£ a: i}ae.F,iea by 



144 



APPENDIX B. MISCELLANEOUS FACTS ON LBP 



The first derivatives of J- are 



E 



= {-Va:i + Vi 



The second derivatives are 

d 2 T 



E 



dr] a -i Org 
d6 Tj 86. 



T-J 



^2 f -Sa,yC<Wa[<l>j,<i>i] + Var [fa] ( - - - 

-Cov 7 [(pj,(j)i} if 7 e Ni \ j j= i, 

- Var 7 [cpi] + Var; {&] if j = i, 7 ^ /3, 

Varj[0j] if j* = i,7 = /3, 

otherwise. 



Note that this matrix of the second derivatives is indexed by the directed edge set E because 
Ory-.j and S^p-.i are indexed by directed edges (7 — > j) and (/3 — > i) respectively. 
At an LBP fixed point, Var 7 [^>j] = Varj[0j] holds. Therefore, 



d 2 F 



<>()■ j. 



diag Var[0 t(e) ]|e G £ [I-M{u)\, 



(B.5) 



where u = {uf_^j} is given by Eq. ()4.3p . The above equation is nothing but Eq. (|4.2 



B.3 Convexity of the Bethe free energy function 

Theorem B.l. For any inference model I on a factor graph H with nullity n(H) = 0, 1, 
the Bethe free energy function F is convex on L(I). 

Proof. This proof is an modification of Corollary 1 in [55J, where only multinomial cases are 
considered. First, we prove that (p a (n a ) — (pi(r]i) is convex for all i & a. More precisely, we 
prove the positive semi-definiteness of the Hessian. From Var^ = \aib a [(j)} and Theorem 
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I A. 1 1 we have 



Accordingly, 3X 



d 2 <fi d 2 (f a d 2 ip a ( d 2 (f a \ 1 d 2 



dr]idr]i dr]idr]i %<9r?/ a ) \drj {a) dr] (a) J dr] {a) di' 



X 1 V% a (r? a ) - <pi(TH))X 







Therefore, the Hessian of f a (r] a ) — (fi(r]i) is positive semidefinite. 
The rest of the proof is the same as Theorem 1 and Corollary 1 of 



(B.6) 



(B.7) 



□ 



Bibliography 



[1] T. Adachi and T. Sunada. Twisted Perron-Frobenius theorem and L-functions. Jour- 
nal of Functional Analysis, 71(l):l-46, 1987. 

[2] L.V. Ahlfors. Complex analysis. McGraw Hill, 1966. 

[3] S.M. Aji and R.J. McEliece. The generalized distributive law. IEEE Transactions on 
Information Theory, 46(2):325-343, 2000. 

[4] S.I. Amari and H. Nagaoka. Methods of information geometry. American Mathemat- 
ical Society, 2000. 

[5] S.A. Amitsur. On the characteristic polynomial of a sum of matrices. Linear Multi- 
linear Algebra, 8:177-182, 1980. 

[6] G. An. A note on the cluster variation method. Journal of Statistical Physics, 
52(3):727-734, 1988. 

[7] M. Artin and B. Mazur. On periodic points. Annals of Mathematics, 81(l):82-99, 
1965. 

[8] F. Barahona. On the computational complexity of Ising spin glass models. Journal 
of Physics A: Mathematical and General, 15:3241-3253, 1982. 

[9] O.E. Barndorff-Nielsen. Information and exponential families in statistical theory, 
1978. 

[10] L. Bartholdi. Counting paths in graphs. Enseign. Math., II. Sir., 45(l-2):83-131, 
1999. 



146 



BIBLIOGRAPHY 



147 



[11] H. Bass. The Ihara-Selberg zeta function of a tree lattice. International Journal of 
Mathematics, 3(6):717-797, 1992. 

[12] C. Berge. Graphs and hypergraphs. North-Holland, 1976. 

[13] C. Berrou, A. Glavieux, and P. Thitimajshima. Near Shannon limit error correcting 
coding and decoding: Turbo-codes. In Proc. of IEEE International Conference on 
Communications, pages 1064-1070, 1993. 

[14] M. Bertero and P. Boccacci. Super-resolution in computational imaging. Micron, 
34(6-7) :265-273, 2003. 

[15] H.A. Bethe. Statistical theory of superlattices. Proc. R. Soc. Lon. A, 150(871):552- 
575, 1935. 

[16] D. Bickson, Y. Tock, O. Shental, and D. Dolev. Polynomial linear programming with 
Gaussian belief propagation. In the 46th Allerton Conf. on Communications, Control 
and Computing, 2008. 

[17] B. Bollobas. Modern graph theory. Springer Verlag, 1998. 

[18] B. Bollobas and O. Riordan. A Tutte polynomial for coloured graphs. Combinatorics, 
Probability and Computing, 8(l&2):45-93, 1999. 

[19] S. Borman and R.L. Stevenson. Super-resolution from image sequences-a review, 
volume 5, 1998. 

[20] J.M. Borwein and A.S. Lewis. Convex analysis and nonlinear optimization: theory 
and examples. Springer Verlag, 2006. 

[21] L.D. Brown. Fundamentals of statistical exponential families: with applications in 
statistical decision theory, volume 9 of Lecture notes-monograph series. IMS, 1986. 

[22] V. Chandrasekaran, M. Chertkov, D. Gamarnik, D. Shah, and J. Shin. Counting 
independent sets using the Bethe approximation, preprint, 2009. 

[23] V.Y. Chernyak and M. Chertkov. Fermions and loops on graphs: II. A monomer- 
dimer model as a series of determinants. Journal of Statistical Mechanics: Theory 
and Experiment, 12:P12012, 2008. 



148 



BIBLIOGRAPHY 



[24] M. Chertkov and V.Y. Chernyak. Loop calculus in statistical physics and information 
science. Phys. Rev. E, 73:65102, 2006. 

[25] M. Chertkov and V.Y. Chernyak. Loop series for discrete statistical models on graphs. 
Journal of Statistical Mechanics: Theory and Experiment, 6:P06009, 2006. 

[26] M. Chertkov, V.Y. Chernyak, and R. Teodorescu. Belief propagation and loop se- 
ries on planar graphs. Journal of Statistical Mechanics: Theory and Experiment, 
2008:P05003. 

[27] M. Chertkov, L. Kroc, and M. Vergassola. Belief propagation and beyond for particle 
tracking, preprint arXiv:0806.1199, 2008. 

[28] F.R.K. Chung. Spectral graph theory. American Mathematical Society, 1997. 

[29] G.F. Cooper. The computational complexity of probabilistic inference using Bayesian 
belief networks. Artificial intelligence, 42:393-405, 1990. 

[30] R.G. Cowell, A. P. Dawid, S.L. Lauritzen, and D.J. Spiegelhalter. Probabilistic net- 
works and expert systems. Springer Verlag, 1999. 

[31] P. Dagum and M. Luby. Approximate probabilistic reasoning in Bayesian belief net- 
works is NP-hard. Artificial Intelligence, 60:141-153, 1993. 

[32] B.A. Dubrovin, A.T. Fomenko, S.P. Novikov, and Burns R.G. Modern geometry: 
methods and applications: Part 2: the geometry and topology of manifolds. Springer- 
Verlag, 1985. 

[33] R. Durbin, S.R. Eddy, A. Krogh, and G. Mitchison. Biological sequence analysis: 
Probabilistic models of proteins and nucleic acids. Cambridge Univ. Pr., 1998. 

[34] J. Ellis-Monaghan and C. Merino. Graph polynomials and their applications I: The 
Tutte polynomial. Structural Analysis of Complex Networks, 2009. 

[35] J. Ellis-Monaghan and C. Merino. Graph polynomials and their applications II: In- 
terrelations and interpretations. Structural Analysis of Complex Networks, 2009. 

[36] P. Exner, J. P. Keating, and P. Kuchment. Analysis on graphs and its applications. 
American Mathematical Society, 2008. 



BIBLIOGRAPHY 



149 



[37] D. Foata and D. Zeilberger. A combinatorial proof of Bass's evaluations of the Ihara- 
Selberg zeta function for graphs. Transactions of the American Mathematical Society, 
351(6):2257-2274, 1999. 

[38] CM. Fortuin and P.W. Kasteleyn. On the random-cluster model. I. Introduction and 
relation to other models. Physica, 57:536-564, 1972. 

[39] W.T. Freeman, E.C. Pasztor, and O.T. Carmichael. Learning low-level vision. Int. J. 
Comput. Vision, 40(l):25-47, 2000. 

[40] R. Gallager. Low-density parity-check codes. IEEE Trans. Inf. Theory, 8(l):21-28, 
1962. 

[41] A.E. Gelfand and A.F.M. Smith. Sampling-based approaches to calculating marginal 
densities. Journal of the American statistical association, 85(410):398-409, 1990. 

[42] H.O. Georgii. Gibbs measures and phase transitions. Walter de Gruyter, 1988. 

[43] C. Godsil and G. Royle. Algebraic graph theory. Springer New York, 2001. 

[44] CD. Godsil and I. Gutman. On the theory of the matching polynomial. J. Graph 
Theory, 5:137-144, 1981. 

[45] G.H. Golub and CF. Van Loan. Matrix computations. Johns Hopkins Univ. Pr., 
1996. 

[46] V. Gomez, H.J. Happen, and M. Chertkov. Approximate inference on planar graphs 
using loop calculus and belief propagation, preprint arXiv:0901.0786, 2009. 

[47] G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions and tractable 
queries. Journal of Computer and System Sciences, 64(3):579-627, 2002. 

[48] G.R. Grimmett. A theorem about random fields. Bulletin of the London Mathematical 
Society, 5(13):81-84, 1973. 

[49] J. Guckenheimer and P. Holmes. Nonlinear oscillations, dynamical systems, and 
bifurcations of vector fields. Springer, 1990. 

[50] M.D. Gupta, S. Rajaram, N. Petrovic, and T.S. Huang. Non-parametric image super- 
resolution using multiple images. In IEEE International Conference on Image Pro- 
cessing, volume 2, 2005. 



150 



BIBLIOGRAPHY 



[51] K. Hashimoto. Zeta functions of finite graphs and representations of p-adic groups. 
Automorphic forms and geometry of arithmetic varieties, 15:211-280, 1989. 

[52] K. Hashimoto. On zeta and L-functions of finite graphs. Internat. J. Math, 1(4):381- 
396, 1990. 

[53] O.J. Heilmann and E.H. Lieb. Theory of monomer-dimer systems. Communications 
in Mathematical Physics, 25:190-232, 1972. 

[54] T. Heskes. Stable fixed points of loopy belief propagation are minima of the Bethe 
free energy. Adv. in Neural Information Processing Systems, 15:343-350, 2002. 

[55] T. Heskes. On the uniqueness of loopy belief propagation fixed points. Neural Com- 
putation, 16(11):2379-2413, 2004. 

[56] T. Heskes, M. Opper, W. Wiegerinck, O. Winther, and O. Zoeter. Approximate 
inference techniques with expectation constraints. Journal of Statistical Mechanics: 
Theory and Experiment, 2005:P11015. 

[57] R.A. Horn and OR. Johnson. Matrix analysis. Cambridge University Press, 1990. 

[58] M.D. Horton, H.M. Stark, and A. A. Terras. Zeta Functions of weighted graphs and 
covering graphs. Analysis on Graphs and Its Applications, 77:29, 2008. 

[59] G. Hua, M.H. Yang, and Y. Wu. Learning to estimate human pose with data driven 
belief propagation. In IEEE Computer Society Conference on Computer Vision and 
Pattern Recognition, volume 2, 2005. 

[60] Y. Ihara. On discrete subgroups of the two by two projective linear group over p-adic 
fields. Journal of the Mathematical Society of Japan, 18(3):219-235, 1966. 

[61] A.T. Ihler, JW Fisher, and A.S. Willsky. Loopy belief propagation: Convergence and 
effects of message errors. Journal of Machine Learning Research, 6(l):905-936, 2006. 

[62] S. Ikeda, T. Tanaka, and S. Amari. Information geometrical framework for analyzing 
belief propagation decoder. Adv. in Neural Information Processing Systems, 14:407, 
2002. 



BIBLIOGRAPHY 



151 



[63] S. Ikeda, T. Tanaka, and S. Amari. Information geometry of turbo and low-density 
parity-check codes. IEEE Transactions on Information Theory, 50(6):1097-1114, 
2004. 

[64] S. Ikeda, T. Tanaka, and S. Amari. Stochastic reasoning, free energy, and information 
geometry. Neural Computation, 16(9): 1779-1810, 2004. 

[65] S. Iwao. Bartholdi zeta functions for hypergraphs. The electronic journal of combi- 
natorics, 14:N2, 2007. 

[66] T.S. Jaakkola. Tutorial on variational approximation methods. Advanced mean field 
methods, pages 129-159, 2000. 

[67] F. Jelinek. Statistical methods for speech recognition. MIT press, 1999. 

[68] M. Jerrum and A. Sinclair. Polynomial time approximations for the Ising Model. 
SI AM J. Computing, 22(5):1087-1116, 1993. 

[69] J.K. Johnson, V.Y. Chernyak, and M. Chertkov. Orbit-Product Representation and 
Correction of Gaussian Belief Propagation. In Proc. of the 26th International Con- 
ference on Machine Learning. ACM New York, 2009. 

[70] M.I. Jordan. Learning in graphical models. Kluwer Academic Publishers, 1998. 

[71] M.I. Jordan, Z. Ghahramani, T.S. Jaakkola, and L.K. Saul. An introduction to 
variational methods for graphical models. Machine learning, 37(2): 183-233, 1999. 

[72] R. Kikuchi. A theory of cooperative phenomena. Physical Review, 81(6):988-1003, 
1951. 

[73] R. Kindermann and J.L. Snell. Markov random fields and their applications. American 
Mathematical Society Providence, 1980. 

[74] R. Koetter, W.C.W. Li, PO Vontobel, and J.L. Walker. Pseudo-codewords of cycle 
codes via zeta functions. IEEE Information Theory Workshop, pages 7-12, 2004. 

[75] R. Koetter, W.C.W. Li, P.O. Vontobel, and J.L. Walker. Characterizations of pseudo- 
codewords of (low-density) parity-check codes. Advances in Mathematics, 213(1) :205- 
229, 2007. 



152 



BIBLIOGRAPHY 



[76] M. Kotani and T. Sunada. Zeta functions of finite graphs. Journal of Mathematical 
Sciences, University of Tokyo, 7(l):7-25, 2000. 

[77] F.R. Kschischang, B.J. Frey, and H.A. Loeliger. Factor graphs and the sum-product 
algorithm. IEEE Transactions on information theory, 47(2):498-519, 2001. 

[78] D.J.C. MacKay. Good error-correcting codes based on very sparse matrices. IEEE 
Trans. Inform. Theory, 45(2):399-431, 1999. 

[79] D.M. Malioutov, J.K. Johnson, and A.S. Willsky. Walk-sums and belief propagation 
in Gaussian graphical models. The Journal of Machine Learning Research, 7:2064, 
2006. 

[80] J. P. May. A concise course in algebraic topology. University of Chicago Press, 1999. 

[81] R.J. McEliece and D.J.C.J.F. Cheng. Turbo decoding as an instance of Pearl's "belief 
propagation" algorithm. IEEE J. Sel. Areas Commun., 16(2):140-52, 1998. 

[82] M. Mezard, G. Parisi, and R. Zecchina. Analytic and algorithmic solution of random 
satisfiability problems. Science, 297(5582) :812, 2002. 

[83] T.P. Minka. Expectation propagation for approximate Bayesian inference. In Proc. of 
the 17th Conference in Uncertainty in Artificial Intelligence, pages 362-369. Morgan 
Kaufmann Publishers, 2001. 

[84] H. Mizuno and I. Sato. Zeta functions of graph coverings. Journal of Combinatorial 
Theory, Series B, 80(2):247-257, 2000. 

[85] H. Mizuno and I. Sato. Weighted zeta functions of graphs. Journal of Combinatorial 
Theory, Series B, 91 (2): 169-183, 2004. 

[86] A. Montanari and T. Rizzo. How to compute loop corrections to the Bethe approxi- 
mation. Journal of Statistical Mechanics: Theory and Experiment, 2005:P10011. 

[87] J. M. Mooij and H. J. Kappen. Sufficient conditions for convergence of the sum- 
product algorithm. IEEE Transactions on Information Theory, 53(12):4422-4437, 
2007. 



BIBLIOGRAPHY 



153 



[88] J.M. Mooij and H.J. Kappen. On the properties of the Bethe approximation and loopy 
belief propagation on binary networks. Journal of Statistical Mechanics: Theory and 
Experiment, 11:P11012, 2005. 

[89] K. Murphy, Y. Weiss, and M.I. Jordan. Loopy belief propagation for approximate 
inference: An empirical study. Proc. of Uncertainty in Artificial Intelligence, 15:467- 
475, 1999. 

[90] J.F. Nagle. New series-expansion method for the dimer problem. Phys. Rev., 152:190- 
197, 1966. 

[91] J.F. Nagle. A new subgraph expansion for obtaining coloring polynomial for graphs. 
Journal of Combinatorial Theory, Series B, 10:42-59, 1971. 

[92] H. Nishimori. Statistical physics of spin glasses and information processing: an intro- 
duction. Oxford University Press, USA, 2001. 

[93] Y. Nishiyama and S. Watanabe. Accuracy of loopy belief propagation in Gaussian 
models. Neural Networks, 22(4):385-394, 2009. 

[94] S. Northshield. A note on the zeta function of a graph. Journal of Combinatorial 
Theory, Series B, 74(2):408-410, 1998. 

[95] P. Pakzad and V. Anantharam. Belief propagation and statistical physics. Conference 
on Information Sciences and Systems, 2002. 

[96] G. Parisi. Statistical field theory. Addison- Wesley, 1988. 

[97] G. Parisi and F. Slanina. Loop expansion around the Bethe-Peierls approxima- 
tion for lattice models. Journal of Statistical Mechanics: Theory and Experiment, 
2006:L02003. 

[98] J. Pearl. Reverend Baycs on inference engines: A distributed hierarchical approach. 
In Proc. of the AAAI National Conference on AI, pages 133-136, 1982. 

[99] J. Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. 
Morgan Kaufmann Publishers, San Mateo, CA, 1988. 

[100] A. Pelizzola. Cluster variation method in statistical physics and probabilistic graphical 
models. Journal of Physics A: Mathematical General, 38(33) :R309-R339, 2005. 



154 



BIBLIOGRAPHY 



[101] W.W. Peterson and EJ Weldon. Error- correcting codes. The MIT Press, 1972. 

[102] L. Saul and M.I. Jordan. Exploiting tractable substructures in intractable networks. 
Adv. in Neural Information Processing Systems, pages 486-492, 1995. 

[103] J.P. Serre. Trees. Springer-Verlag, 1980. 

[104] P.P. Shenoy and G. Shafer. Axioms for probability and belief-function propagation. 
In Classic Works of the Dempster-Shafer Theory of Belief Functions, pages 499-528. 
Springer, 2008. 

[105] O. Shental, P.H. Siegel, J.K. Wolf, D. Bickson, and D. Dolev. Gaussian belief prop- 
agation solver for systems of linear equations. In IEEE International Symposium on 
Information Theory, pages 1863-1867, 2008. 

[106] A. Simsek, A. Ozdaglar, and D. Acemoglu. Uniqueness of generalized equilibrium for 
box constrained problems and applications. In Proc. Allerton, 2005. 

[107] A. Simsek, A. Ozdaglar, and D. Acemoglu. Generalized Poincare-Hopf theorem for 
compact nonsmooth regions. Mathematics of Operations Research, 32(1):193, 2007. 

[108] A.D. Sokal. Bounds on the complex zeros of (di)chromatic polynomials and Potts- 
model partition functions. Combinatorics, Probability and Computing, 10:41-77, 2001. 

[109] J.R. Starlings. Topology of finite graphs. Invent, math., 71:551-565, 1983. 

[110] H.M. Stark and A. A. Terras. Zeta functions of finite graphs and coverings. Advances 
in Mathematics, 121(1):124-165, 1996. 

[Ill] H.M. Stark and A. A. Terras. Zeta functions of finite graphs and coverings, part II. 
Advances in Mathematics, 154(1):132-195, 2000. 

[112] C.K. Storm. The zeta function of a hypergraph. The electronic journal of combina- 
torics, 13:R84, 2006. 

[113] E.B. Sudderth, M.J. Wainwright, and A.S. Willsky. Loop series and Bethe varia- 
tional bounds in attractive graphical models. Adv. in Neural Information Processing 
Systems, 20:1425-1432, 2008. 



BIBLIOGRAPHY 



155 



[114] T. Sunada. Riemannian coverings and isospectral manifolds. The Annals of Mathe- 
matics, 121(1):169-186, 1985. 

[115] T. Sunada. L-functions in geometry and some applications. Lecture Notes in Math, 
1201:266-284, 1986. 

[116] T. Sunada. Discrete geometric analysis. Analysis on Graphs and Its Applications, 
77:51, 2008. 

[117] K. Tanaka. Statistical-mechanical approach to image processing. Journal of Physics 
A: Mathematical and General, 35(37):R81-R150, 2002. 

[118] A. Tang, J. Wang, S.H. Low, and M. Chiang. Equilibrium of heterogeneous congestion 
control: Existence and uniqueness. IEEE/ACM Transactions on Networking (TON), 
15(4):837, 2007. 

[119] S. Tatikonda and M.I. Jordan. Loopy belief propagation and Gibbs measures. Uncer- 
tainty in AI, 18:493-500, 2002. 

[120] W.T. Tutte. A ring in graph theory. Proc. Camb. Phil. Soc, 43:26-44, 1947. 

[121] K. Ueno, K. Shiga, T. Sunada, E. Tyler, and S. Morita. A mathematical gift: the 
interplay between topology, functions, geometry, and algebra. American Mathematical 
Society, 2005. 

[122] M.J. Wainwright. Stochastic processes on graphs with cycles: geometric and varia- 
tional approaches. PhD thesis, Massachusetts Institute of Technology, 2002. 

[123] M.J. Wainwright, T.S. Jaakkola, and A.S. Willsky. Tree-based reparameterization 
framework for analysis of sum-product and related algorithms. IEEE Transactions 
on Information Theory, 49(5):1120-1146, 2003. 

[124] M.J. Wainwright and M.I. Jordan. Variational inference in graphical models: The view 
from the marginal polytope. In Proc. of the Allerton Conference on Communication, 
Control, and Computing, volume 41, pages 961-971, 2003. 

[125] M.J. Wainwright and M.I. Jordan. Graphical models, exponential families, and vari- 
ational inference. Foundations and Trends in Machine Learning, l(l-2):l-305, 2008. 



156 



BIBLIOGRAPHY 



[126] Y. Watanabe and K. Fukumizu. Graph zeta function in the Bethe free energy and 
loopy belief propagation. Adv. in Neural Information Processing Systems, 22:2017- 
2025, 2009. 

[127] Y. Watanabe and K. Fukumizu. Loop series expansion with propagation diagrams. 
Journal of Physics A: Mathematical and Theoretical, 42(4):045001, 2009. 

[128] Y. Weiss. Correctness of local probability propagation in graphical models with loops. 
Neural Computation, 12(1):1-41, 2000. 

[129] Y. Weiss and W.T. Freeman. Correctness of belief propagation in Gaussian graphical 
models of arbitrary topology. Neural Computation, 13(10):2173-2200, 2001. 

[130] M. Welling and YW. Teh. Belief optimization for binary networks: A stable alterna- 
tive to loopy belief propagation. In Uncertainty in Artificial Intelligence, 2001. 

[131] D.J. A. Welsh. Complexity: knots, colourings and counting. Cambridge Univ. Pr., 
1993. 

[132] J. Whittaker. Graphical models in applied multivariate statistics. Wiley Publishing, 
2009. 

[133] W. Wiegerinck and T. Heskes. Fractional belief propagation. Adv. in Neural Infor- 
mation Processing Systems, 15:455-462, 2003. 

[134] T.D. Yang, C.N. Lee. Statistical theory of equations of state and phase transitions. 
I: Theory of condensation. Physical Review, 87(3):404-409, 1952. 

[135] J.S. Yedidia, W.T. Freeman, and Y. Weiss. Generalized belief propagation. Adv. in 
Neural Information Processing Systems, 13:689-95, 2001. 

[136] J.S. Yedidia, W.T. Freeman, and Y. Weiss. Constructing free-energy approximations 
and generalized belief propagation algorithms. IEEE Transactions on Information 
Theory, 51(7):2282-2312, 2005. 

[137] A.L. Yuille. CCCP algorithms to minimize the Bethe and Kikuchi free energies: 
Convergent alternatives to belief propagation. Neural Computation, 14(7):1691-1722, 
2002. 



